summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2008-02-25NFS: Fix a potential file corruption issue when writingTrond Myklebust
patch 5d47a35600270e7115061cb1320ee60ae9bcb6b8 in mainline. If the inode is flagged as having an invalid mapping, then we can't rely on the PageUptodate() flag. Ensure that we don't use the "anti-fragmentation" write optimisation in nfs_updatepage(), since that will cause NFS to write out areas of the page that are no longer guaranteed to be up to date. A potential corruption could occur in the following scenario: client 1 client 2 =============== =============== fd=open("f",O_CREAT|O_WRONLY,0644); write(fd,"fubar\n",6); // cache last page close(fd); fd=open("f",O_WRONLY|O_APPEND); write(fd,"foo\n",4); close(fd); fd=open("f",O_WRONLY|O_APPEND); write(fd,"bar\n",4); close(fd); ----- The bug may lead to the file "f" reading 'fubar\n\0\0\0\nbar\n' because client 2 does not update the cached page after re-opening the file for write. Instead it keeps it marked as PageUptodate() until someone calls invaldate_inode_pages2() (typically by calling read()). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-10splice: fix user pointer access in get_iovec_page_array()Bastian Blank
patch 712a30e63c8066ed84385b12edbfb804f49cbc44 in mainline. Commit 8811930dc74a503415b35c4a79d14fb0b408a361 ("splice: missing user pointer access verification") added the proper access_ok() calls to copy_from_user_mmap_sem() which ensures we can copy the struct iovecs from userspace to the kernel. But we also must check whether we can access the actual memory region pointed to by the struct iovec to fix the access checks properly. Signed-off-by: Bastian Blank <waldi@debian.org> Acked-by: Oliver Pinter <oliver.pntr@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08splice: missing user pointer access verification (CVE-2008-0009/10)Jens Axboe
patch 8811930dc74a503415b35c4a79d14fb0b408a361 in mainline. vmsplice_to_user() must always check the user pointer and length with access_ok() before copying. Likewise, for the slow path of copy_from_user_mmap_sem() we need to check that we may read from the user region. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Cc: Wojciech Purczynski <cliph@research.coseinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08vm audit: add VM_DONTEXPAND to mmap for drivers that need it (CVE-2008-0007)Nick Piggin
Drivers that register a ->fault handler, but do not range-check the offset argument, must set VM_DONTEXPAND in the vm_flags in order to prevent an expanding mremap from overflowing the resource. I've audited the tree and attempted to fix these problems (usually by adding VM_DONTEXPAND where it is not obvious). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08knfsd: Allow NFSv2/3 WRITE calls to succeed when krb5i etc is used.NeilBrown
patch ba67a39efde8312e386c6f603054f8945433d91f in mainline. When RPCSEC/GSS and krb5i is used, requests are padded, typically to a multiple of 8 bytes. This can make the request look slightly longer than it really is. As of f34b95689d2ce001c "The NFSv2/NFSv3 server does not handle zero length WRITE request correctly", the xdr decode routines for NFSv2 and NFSv3 reject requests that aren't the right length, so krb5i (for example) WRITE requests can get lost. This patch relaxes the appropriate test and enhances the related comment. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Peter Staubach <staubach@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08CIFS: Respect umask when using POSIX mkdirSteve French
patch a8cd925f74c3b1b6d1192f9e75f9d12cc2ab148a in mainline. [CIFS] Respect umask when using POSIX mkdir When making a directory with POSIX mkdir calls, cifs_mkdir does not respect the umask. This patch causes the new POSIX mkdir to create with the right mode Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08vfs: coredumping fix (CVE-2007-6206)Ingo Molnar
vfs: coredumping fix patch c46f739dd39db3b07ab5deb4e3ec81e1c04a91af in mainline fix: http://bugzilla.kernel.org/show_bug.cgi?id=3043 only allow coredumping to the same uid that the coredumping task runs under. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Alan Cox <alan@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-14Use access mode instead of open flags to determine needed permissions ↵Linus Torvalds
(CVE-2008-0001) patch 974a9f0b47da74e28f68b9c8645c3786aa5ace1a in mainline Way back when (in commit 834f2a4a1554dc5b2598038b3fe8703defcbe467, aka "VFS: Allow the filesystem to return a full file pointer on open intent" to be exact), Trond changed the open logic to keep track of the original flags to a file open, in order to pass down the the intent of a dentry lookup to the low-level filesystem. However, when doing that reorganization, it changed the meaning of namei_flags, and thus inadvertently changed the test of access mode for directories (and RO filesystem) to use the wrong flag. So fix those test back to use access mode ("acc_mode") rather than the open flag ("flag"). Issue noticed by Bill Roman at Datalight. Reported-and-tested-by: Bill Roman <bill.roman@datalight.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14XFS: Make xfsbufd threads freezableRafael J. Wysocki
patch 978c7b2ff49597ab76ff7529a933bd366941ac25 in mainline Fix breakage caused by commit 831441862956fffa17b9801db37e6ea1650b0f69 that did not introduce the necessary call to set_freezable() in xfs/linux-2.6/xfs_buf.c . SGI-PV: 974224 SGI-Modid: xfs-linux-melb:xfs-kern:30203a Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Cc: Oliver Pintr <oliver.pntr@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-26reiserfs: don't drop PG_dirty when releasing sub-page-sized dirty fileFengguang Wu
patch c06a018fa5362fa9ed0768bd747c0fab26bc8849 in mainline. This is not a new problem in 2.6.23-git17. 2.6.22/2.6.23 is buggy in the same way. Reiserfs could accumulate dirty sub-page-size files until umount time. They cannot be synced to disk by pdflush routines or explicit `sync' commands. Only `umount' can do the trick. The direct cause is: the dirty page's PG_dirty is wrongly _cleared_. Call trace: [<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0 [<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710 [<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530 [<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0 [<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340 [<ffffffff802a187c>] __fput+0xcc/0x1b0 [<ffffffff802a1ba6>] fput+0x16/0x20 [<ffffffff8029e676>] filp_close+0x56/0x90 [<ffffffff8029fe0d>] sys_close+0xad/0x110 [<ffffffff8020c41e>] system_call+0x7e/0x83 Fix the bug by removing the cancel_dirty_page() call. Tests show that it causes no bad behaviors on various write sizes. === for the patient === Here are more detailed demonstrations of the problem. 1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to; and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed. ------------------------------ screen 0 ------------------------------ [T0] root /home/wfg# cat > /test/tiny [T1] hi [T2] root /home/wfg# ------------------------------ screen 1 ------------------------------ [T1] root /home/wfg# echo /test/tiny > /proc/filecache [T1] root /home/wfg# cat /proc/filecache # file /test/tiny # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback # idx len state refcnt 0 1 ___UD__Bd_ 2 [T2] root /home/wfg# cat /proc/filecache # file /test/tiny # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback # idx len state refcnt 0 1 ___U___Bd_ 2 2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied. ------------------------------ screen 0 ------------------------------ [T0] root /home/wfg# echo hi > /tmp/hi [T1] root /home/wfg# cp /tmp/hi /dev/stdin /test [T2] hi [T3] root /home/wfg# ------------------------------ screen 1 ------------------------------ [T1] root /proc/4397# cd /proc/`pidof cp` [T1] root /proc/4713# cat io rchar: 8396 wchar: 3 syscr: 20 syscw: 1 read_bytes: 0 write_bytes: 20480 cancelled_write_bytes: 4096 [T2] root /proc/4713# cat io rchar: 8399 wchar: 6 syscr: 21 syscw: 2 read_bytes: 0 write_bytes: 24576 cancelled_write_bytes: 4096 //Question: the 'write_bytes' is a bit more than expected ;-) Tested-by: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Reviewed-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-26nfsd4: recheck for secure ports in fh_verifyJ. Bruce Fields
patch 6fa02839bf9412e18e773d04e96182b4cd0b5d57 in mainline. As with 7fc90ec93a5eb71f4b08... "call nfsd_setuser() on fh_compose()..." this is a case where we need to redo a security check in fh_verify() even though the filehandle already has an associated dentry--if the filehandle was created by fh_compose() in an earlier operation of the nfsv4 compound, then we may not have done these checks yet. Without this fix it is possible, for example, to traverse from an export without the secure ports requirement to one with it in a single compound, and bypass the secure port check on the new export. While we're here, fix up some minor style problems and change a printk() to a dprintk(), to make it harder for random unprivileged users to spam the logs. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-By: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-26knfsd: fix spurious EINVAL errors on first access of new filesystemJ. Bruce Fields
patch ac8587dcb58e40dd336d99d60f852041e06cc3dd in mainline. The v2/v3 acl code in nfsd is translating any return from fh_verify() to nfserr_inval. This is particularly unfortunate in the case of an nfserr_dropit return, which is an internal error meant to indicate to callers that this request has been deferred and should just be dropped pending the results of an upcall to mountd. Thanks to Roland <devzero@web.de> for bug report and data collection. Cc: Roland <devzero@web.de> Acked-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-By: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16NFS: Fix a writeback race...Trond Myklebust
patch 61e930a904966cc37e0a3404276f0b73037e57ca in mainline This patch fixes a regression that was introduced by commit 44dd151d5c21234cc534c47d7382f5c28c3143cd We cannot zero the user page in nfs_mark_uptodate() any more, since a) We'd be modifying the page without holding the page lock b) We can race with other updates of the page, most notably because of the call to nfs_wb_page() in nfs_writepage_setup(). Instead, we do the zeroing in nfs_update_request() if we see that we're creating a request that might potentially be marked as up to date. Thanks to Olivier Paquet for reporting the bug and providing a test-case. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16ocfs2: fix write() performance regressionMark Fasheh
patch 4e9563fd55ff4479f2b118d0757d121dd0cfc39c in mainline. ocfs2: fix write() performance regression On file systems which don't support sparse files, Ocfs2_map_page_blocks() was reading blocks on appending writes. This caused write performance to suffer dramatically. Fix this by detecting an appending write on a nonsparse fs and skipping the read. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16minixfs: limit minixfs printks on corrupted dir i_size (CVE-2006-6058)Eric Sandeen
patch f44ec6f3f89889a469773b1fd894f8fcc07c29cf upstream. This attempts to address CVE-2006-6058 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058 first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html Essentially a corrupted minix dir inode reporting a very large i_size will loop for a very long time in minix_readdir, minix_find_entry, etc, because on EIO they just move on to try the next page. This is under the BKL, printk-storming as well. This can lock up the machine for a very long time. Simply ratelimiting the printks gets things back under control. Make the message a bit more informative while we're here. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Bodo Eggert <7eggert@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16xfs: eagerly remove vmap mappings to avoid upsetting XenJeremy Fitzhardinge
patch ace2e92e193126711cb3a83a3752b2c5b8396950 in mainline. XFS leaves stray mappings around when it vmaps memory to make it virtually contigious. This upsets Xen if one of those pages is being recycled into a pagetable, since it finds an extra writable mapping of the page. This patch solves the problem in a brute force way, by making XFS always eagerly unmap its mappings. [ Stable: This works around a bug in 2.6.23. We may come up with a better solution for mainline, but this seems like a low-impact fix for the stable kernel. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: XFS masters <xfs-masters@oss.sgi.com> Cc: Morten =?utf-8?q?B=C3=B8geskov?= <xen-users@morten.bogeskov.dk> Cc: Mark Williamson <mark.williamson@cl.cam.ac.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16sched: keep utime/stime monotonicFrans Pop
sched: keep utime/stime monotonic cpustats use utime/stime as a ratio against sum_exec_runtime, as a consequence it can happen - when the ratio changes faster than time accumulates - that either can be appear to go backwards. Combined backport for 2.6.23 of the following patches from mainline: commit 73a2bcb0edb9ffb0b007b3546b430e2c6e415eee Author: Peter Zijlstra <a.p.zijlstra@chello.nl> sched: keep utime/stime monotonic commit 9301899be75b464ef097f0b5af7af6d9bd8f68a7 Author: Balbir Singh <balbir@linux.vnet.ibm.com> sched: fix /proc/<PID>/stat stime/utime monotonicity, part 2 Signed-off-by: Frans Pop <elendil@planet.nl> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16splice: fix double kunmap() in vmsplice copy pathJens Axboe
patch 6866bef40d06f7c2baac3a855b1917a8ca75456c in mainline. The out label should not include the unmap, the only way to jump there already has unmapped the source. 00002000 f7c21a00 00000000 00000000 c0489036 00018e32 00000002 00000000 00001000 Call Trace: [<c0487dd9>] pipe_to_user+0xca/0xd3 [<c0488233>] __splice_from_pipe+0x53/0x1bd [<c0454947>] ------------[ cut here ]------------ filemap_fault+0x221/0x380 [<c0487d0f>] pipe_to_user+0x0/0xd3 [<c0489036>] sys_vmsplice+0x3b7/0x422 [<c045ec3f>] kernel BUG at mm/highmem.c:206! handle_mm_fault+0x4d5/0x8eb [<c041ed5b>] kmap_atomic+0x1c/0x20 [<c045d33d>] unmap_vmas+0x3d1/0x584 [<c045f717>] free_pgtables+0x90/0xa0 [<c041d84b>] pgd_dtor+0x0/0x1 [<c044d665>] audit_syscall_exit+0x2aa/0x2c6 [<c0407817>] do_syscall_trace+0x124/0x169 [<c0404df2>] syscall_call+0x7/0xb ======================= Code: 2d 00 d0 5b 00 25 00 00 e0 ff 29 invalid opcode: 0000 [#1] c2 89 d0 c1 e8 0c 8b 14 85 a0 6c 7c c0 4a 85 d2 89 14 85 a0 6c 7c c0 74 07 31 c9 4a 75 15 eb 04 <0f> 0b eb fe 31 c9 81 3d 78 38 6d c0 78 38 6d c0 0f 95 c1 b0 01 EIP: [<c045bbc3>] kunmap_high+0x51/0x8e SS:ESP 0068:f5960df0 SMP Modules linked in: netconsole autofs4 hidp nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc ipv6 ib_iser rdma_cm ib_cm iw_cmib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath dm_mod video output sbs batteryac parport_pc lp parport sg i2c_piix4 i2c_core floppy cfi_probe gen_probe scb2_flash mtd chipreg tg3 e1000 button ide_cd serio_raw cdrom aic7xxx scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 3 EIP: 0060:[<c045bbc3>] Not tainted VLI EFLAGS: 00010246 (2.6.23 #1) EIP is at kunmap_high+0x51/0x8e Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16locks: fix possible infinite loop in posix deadlock detectionJ. Bruce Fields
patch 97855b49b6bac0bd25f16b017883634d13591d00 in mainline. It's currently possible to send posix_locks_deadlock() into an infinite loop (under the BKL). For now, fix this just by bailing out after a few iterations. We may want to fix this in a way that better clarifies the semantics of deadlock detection. But that will take more time, and this minimal fix is probably adequate for any realistic scenario, and is simple enough to be appropriate for applying to stable kernels now. Thanks to George Davis for reporting the problem. Cc: "George G. Davis" <gdavis@mvista.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-09NLM: Fix a memory leak in nlmsvc_testlockTrond Myklebust
The recent fix for a circular lock dependency unfortunately introduced a potential memory leak in the event where the call to nlmsvc_lookup_host fails for some reason. Thanks to Roel Kluin for spotting this. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-08AIO: fix cleanup in io_submit_one(...)Yan Zheng
When IOCB_FLAG_RESFD flag is set and iocb->aio_resfd is incorrect, statement 'goto out_put_req' is executed. At label 'out_put_req', aio_put_req(..) is called, which requires 'req->ki_filp' set. Signed-off-by: Yan Zheng<yanzheng@21cn.com> Cc: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-03Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6 * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: Blackfin arch: fix PORT_J BUG for BF537/6 EMAC driver reported by Kalle Pokki <kalle.pokki@iki.fi> Blackfin arch: gpio pinmux and resource allocation API required by BF537 on chip ethernet mac driver Blackfin arch: add some missing syscall binfmt_flat: checkpatch fixing minimum support for the blackfin relocations Binfmt_flat: Add minimum support for the Blackfin relocations
2007-10-03ocfs2: Unlock mutex in local alloc failure caseSunil Mushran
The fs was not unlocking the local alloc inode mutex in the code path in which it failed to find a window of free bits in the global bitmap. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-01Fix possible splice() mmap_sem deadlockLinus Torvalds
Nick Piggin points out that splice isn't being good about the mmap semaphore: while two readers can nest inside each others, it does leave a possible deadlock if a writer (ie a new mmap()) comes in during that nesting. Original "just move the locking" patch by Nick, replaced by one by me based on an optimistic pagefault_disable(). And then Jens tested and updated that patch. Reported-by: Nick Piggin <npiggin@suse.de> Tested-by: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-01Revert "[XFS] Avoid replaying inode buffer initialisation log items if ↵Tim Shimmin
on-disk version is newer." This reverts commit b394e43e995d08821588a22561c6a71a63b4ff27. Lachlan McIlroy says: It tried to fix an issue where log replay is replaying an inode cluster initialisation transaction that should not be replayed because the inode cluster on disk is more up to date. Since we don't log file sizes (we rely on inode flushing to get them to disk) then we can't just replay all the transations in the log and expect the inode to be completely restored. We lose file size updates. Unfortunately this fix is causing more (serious) problems than it is fixing. SGI-PV: 969656 SGI-Modid: xfs-linux-melb:xfs-kern:29804a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-09-28NFS: Fix an Oops in encode_lookup()Trond Myklebust
It doesn't look as if the NFS file name limit is being initialised correctly in the struct nfs_server. Make sure that we limit whatever is being set in nfs_probe_fsinfo() and nfs_init_server(). Also ensure that readdirplus and nfs4_path_walk respect our file name limits. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26NLM: Fix a circular lock dependency in lockdTrond Myklebust
The problem is that the garbage collector for the 'host' structures nlm_gc_hosts(), holds nlm_host_mutex while calling down to nlmsvc_mark_resources, which, eventually takes the file->f_mutex. We cannot therefore call nlmsvc_lookup_host() from within nlmsvc_create_block, since the caller will already hold file->f_mutex, so the attempt to grab nlm_host_mutex may deadlock. Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26Merge branch 'upstream-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: [PATCH] WE : Add missing auth compat-ioctl [PATCH] softmac: Fix inability to associate with WEP networks
2007-09-25Merge branch 'fixes-jgarzik' of ↵Jeff Garzik
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes
2007-09-25ufs: fix sun stateEvgeniy Dushistov
Different types of ufs hold state in different places, to hide complexity of this, there is ufs_get_fs_state, it returns state according to "UFS_SB(sb)->s_flags", but during mount ufs_get_fs_state is called, before setting s_flags, this cause message for ufs types like sun ufs: "fs need fsck", and remount in readonly state. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-22Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6Linus Torvalds
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] fix valid but harmless sparse warning [XFS] fix filestreams on 32-bit boxes
2007-10-03binfmt_flat: checkpatch fixing minimum support for the blackfin relocationsAndrew Morton
Cc: Bernd Schmidt <bernd.schmidt@analog.com> Cc: David McCullough <davidm@snapgear.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Miles Bader <miles.bader@necel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Bryan Wu <bryan.wu@analog.com>
2007-10-03Binfmt_flat: Add minimum support for the Blackfin relocationsBernd Schmidt
Add minimum support for the Blackfin relocations, since we don't have enough space in each reloc. The idea is to store a value with one relocation so that subsequent ones can access it. Actually, this patch is required for Blackfin. Currently if BINFMT_FLAT is enabled, git-tree kernel will fail to compile. Signed-off-by: Bernd Schmidt <bernd.schmidt@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Cc: David McCullough <davidm@snapgear.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Miles Bader <miles.bader@necel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-09-21Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: Pack vote message and response structures ocfs2: Don't double set write parameters ocfs2: Fix pos/len passed to ocfs2_write_cluster ocfs2: Allow smaller allocations during large writes
2007-09-21[PATCH] WE : Add missing auth compat-ioctlJean Tourrilhes
Johannes just found that we are missing a compat-ioctl declaration. The fix is trivial. As previous patches for compat-ioctl, this should also go to stable. More info : http://marc.info/?l=linux-wireless&m=119029667902588&w=2 Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-09-20ocfs2: Pack vote message and response structuresSunil Mushran
The ocfs2_vote_msg and ocfs2_response_msg structs needed to be packed to ensure similar sizeofs in 32-bit and 64-bit arches. Without this, we had inadvertantly broken 32/64 bit cross mounts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20ocfs2: Don't double set write parametersMark Fasheh
The target page offsets were being incorrectly set a second time in ocfs2_prepare_page_for_write(), which was causing problems on a 16k page size kernel. Additionally, ocfs2_write_failure() was incorrectly using those parameters instead of the parameters for the individual page being cleaned up. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20ocfs2: Fix pos/len passed to ocfs2_write_clusterMark Fasheh
This was broken for file systems whose cluster size is greater than page size. Pos needs to be incremented as we loop through the descriptors, and len needs to be capped to the size of a single cluster. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20ocfs2: Allow smaller allocations during large writesMark Fasheh
The ocfs2 write code loops through a page much like the block code, except that ocfs2 allocation units can be any size, including larger than page size. Typically it's equal to or larger than page size - most kernels run 4k pages, the minimum ocfs2 allocation (cluster) size. Some changes introduced during 2.6.23 changed the way writes to pages are handled, and inadvertantly broke support for > 4k page size. Instead of just writing one cluster at a time, we now handle the whole page in one pass. This means that multiple (small) seperate allocations might happen in the same pass. The allocation code howver typically optimizes by getting the maximum which was reserved. This triggered a BUG_ON in the extend code where it'd ask for a single bit (for one part of a > 4k page) and get back more than it asked for. Fix this by providing a variant of the high level allocation function which allows the caller to specify a maximum. The traditional function remains and just calls the new one with a maximum determined from the initial reservation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20signalfd simplificationDavide Libenzi
This simplifies signalfd code, by avoiding it to remain attached to the sighand during its lifetime. In this way, the signalfd remain attached to the sighand only during poll(2) (and select and epoll) and read(2). This also allows to remove all the custom "tsk == current" checks in kernel/signal.c, since dequeue_signal() will only be called by "current". I think this is also what Ben was suggesting time ago. The external effect of this, is that a thread can extract only its own private signals and the group ones. I think this is an acceptable behaviour, in that those are the signals the thread would be able to fetch w/out signalfd. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-20[XFS] fix valid but harmless sparse warningChristoph Hellwig
The new xlog_recover_do_reg_buffer checks call be16_to_cpu on di_gen which is a 32bit value so sparse rightly complains. Fortunately the warning is harmless because we don't care for the value, but only whether it's non-NULL. Due to that fact we can simply kill the endian swaps on this and the previous di_mode check entirely. SGI-PV: 969656 SGI-Modid: xfs-linux-melb:xfs-kern:29709a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-09-20[XFS] fix filestreams on 32-bit boxesEric Sandeen
xfs_filestream_mount() sets up an mru cache with: err = xfs_mru_cache_create(&mp->m_filestream, lifetime, grp_count, (xfs_mru_cache_free_func_t)xfs_fstrm_free_func); but that cast is causing problems... typedef void (*xfs_mru_cache_free_func_t)(unsigned long, void*); but: void xfs_fstrm_free_func( xfs_ino_t ino, fstrm_item_t *item) so on a 32-bit box, it's casting (32, 32) args into (64, 32) and I assume it's getting garbage for *item, which subsequently causes an explosion. With this change the filestreams xfsqa tests don't oops on my 32-bit box. SGI-PV: 967795 SGI-Modid: xfs-linux-melb:xfs-kern:29510a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-09-19Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6Linus Torvalds
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Avoid replaying inode buffer initialisation log items if on-disk version is newer. [XFS] Ensure file size updates have been completed before writing inode to disk. [XFS] On-demand reaping of the MRU cache
2007-09-19ext34: ensure do_split leaves enough free space in both blocksEric Sandeen
The do_split() function for htree dir blocks is intended to split a leaf block to make room for a new entry. It sorts the entries in the original block by hash value, then moves the last half of the entries to the new block - without accounting for how much space this actually moves. (IOW, it moves half of the entry *count* not half of the entry *space*). If by chance we have both large & small entries, and we move only the smallest entries, and we have a large new entry to insert, we may not have created enough space for it. The patch below stores each record size when calculating the dx_map, and then walks the hash-sorted dx_map, calculating how many entries must be moved to more evenly split the existing entries between the old block and the new block, guaranteeing enough space for the new entry. The dx_map "offs" member is reduced to u16 so that the overall map size does not change - it is temporarily stored at the end of the new block, and if it grows too large it may be overwritten. By making offs and size both u16, we won't grow the map size. Also add a few comments to the functions involved. This fixes the testcase reported by hooanon05@yahoo.co.jp on the linux-ext4 list, "ext3 dir_index causes an error" Thanks to Andreas Dilger for discussing the problem & solution with me. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Tested-by: Junjiro Okajima <hooanon05@yahoo.co.jp> Cc: Theodore Ts'o <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19nfs: fix oops re sysctls and V4 supportAlexey Dobriyan
NFS unregisters sysctls only if V4 support is compiled in. However, sysctl table is not V4 specific, so unregister it always. Steps to reproduce: [build nfs.ko with CONFIG_NFS_V4=n] modrobe nfs rmmod nfs ls /proc/sys Unable to handle kernel paging request at ffffffff880661c0 RIP: [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350 PGD 203067 PUD 207063 PMD 7e216067 PTE 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: lockd nfs_acl sunrpc Pid: 3335, comm: ls Not tainted 2.6.23-rc3-bloat #2 RIP: 0010:[<ffffffff802af8e3>] [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350 RSP: 0018:ffff81007fd93e78 EFLAGS: 00010286 RAX: ffffffff880661c0 RBX: ffffffff80466370 RCX: ffffffff880661c0 RDX: 00000000000014c0 RSI: ffff81007f3ad020 RDI: ffff81007efd8b40 RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: ffffffff802a8570 R12: ffffffff880661c0 R13: ffff81007e219640 R14: ffff81007efd8b40 R15: ffff81007ded7280 FS: 00002ba25ef03060(0000) GS:ffff81007ff81258(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffff880661c0 CR3: 000000007dfaf000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ls (pid: 3335, threadinfo ffff81007fd92000, task ffff81007d8a0000) Stack: ffff81007f3ad150 ffffffff80283f30 ffff81007fd93f48 ffff81007efd8b40 ffff81007ee00440 0000000422222222 0000000200035593 ffffffff88037e9a 2222222222222222 ffffffff80466500 ffff81007e416400 ffff81007e219640 Call Trace: [<ffffffff80283f30>] filldir+0x0/0xf0 [<ffffffff80283f30>] filldir+0x0/0xf0 [<ffffffff802840c7>] vfs_readdir+0xa7/0xc0 [<ffffffff80284376>] sys_getdents+0x96/0xe0 [<ffffffff8020bb3e>] system_call+0x7e/0x83 Code: 41 8b 14 24 85 d2 74 dc 49 8b 44 24 08 48 85 c0 74 e7 49 3b RIP [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350 RSP <ffff81007fd93e78> CR2: ffffffff880661c0 Kernel panic - not syncing: Fatal exception Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19dir_index: error out instead of BUG on corrupt dx dirsEric Sandeen
Convert asserts (BUGs) in dx_probe from bad on-disk data to recoverable errors with helpful warnings. With help catching other asserts from Duane Griffin <duaneg@dghda.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Duane Griffin <duaneg@dghda.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-18[XFS] Avoid replaying inode buffer initialisation log items if on-disk ↵Lachlan McIlroy
version is newer. SGI-PV: 969656 SGI-Modid: xfs-linux-melb:xfs-kern:29676a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-09-18[XFS] Ensure file size updates have been completed before writing inode to disk.Lachlan McIlroy
SGI-PV: 968767 SGI-Modid: xfs-linux-melb:xfs-kern:29675a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-09-17[XFS] On-demand reaping of the MRU cacheDavid Chinner
Instead of running the mru cache reaper all the time based on a timeout, we should only run it when the cache has active objects. This allows CPUs to sleep when there is no activity rather than be woken repeatedly just to check if there is anything to do. SGI-PV: 968554 SGI-Modid: xfs-linux-melb:xfs-kern:29305a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-09-15Merge branch 'fixes-jgarzik' of ↵Jeff Garzik
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes