summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2008-09-08cifs: fix O_APPEND on directio mountsJeff Layton
commit 838726c4756813576078203eb7e1e219db0da870 upstream The direct I/O write codepath for CIFS is done through cifs_user_write(). That function does not currently call generic_write_checks() so the file position isn't being properly set when the file is opened with O_APPEND. It's also not doing the other "normal" checks that should be done for a write call. The problem is currently that when you open a file with O_APPEND on a mount with the directio mount option, the file position is set to the beginning of the file. This makes any subsequent writes clobber the data in the file starting at the beginning. This seems to fix the problem in cursory testing. It is, however important to note that NFS disallows the combination of (O_DIRECT|O_APPEND). If my understanding is correct, the concern is races with multiple clients appending to a file clobbering each others' data. Since the write model for CIFS and NFS is pretty similar in this regard, CIFS is probably subject to the same sort of races. What's unclear to me is why this is a particular problem with O_DIRECT and not with buffered writes... Regardless, disallowing O_APPEND on an entire mount is probably not reasonable, so we'll probably just have to deal with it and reevaluate this flag combination when we get proper support for O_DIRECT. In the meantime this patch at least fixes the existing problem. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-09-08cramfs: fix named-pipe handlingAl Viro
commit 82d63fc9e30687c055b97928942b8893ea65b0bb upstream After commit a97c9bf33f4612e2aed6f000f6b1d268b6814f3c (fix cramfs making duplicate entries in inode cache) in kernel 2.6.14, named-pipe on cramfs does not work properly. It seems the commit make all named-pipe on cramfs share their inode (and named-pipe buffer). Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino == 1 immediately. Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-09-08nfsd: fix buffer overrun decoding NFSv4 aclJ. Bruce Fields
commit 91b80969ba466ba4b915a4a1d03add8c297add3f upstream The array we kmalloc() here is not large enough. Thanks to Johann Dahm and David Richter for bug report and testing. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: David Richter <richterd@citi.umich.edu> Tested-by: Johann Dahm <jdahm@umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-20CIFS: Fix compiler warning on 64-bitJan Beulich
commit 04e1e0cccade330ab3715ce59234f7e3b087e246 upstream. Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Eugene Teo <eteo@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-20CIFS: if get root inode fails during mount, cleanup tree connectionSteve French
commit 2c731afb0d4ba16018b400c75665fbdb8feb2175 upstream Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-20CIFS: mount of IPC$ breaks with iget patchSteve French
commit ad661334b8ae421154b121ee6ad3b56807adbf11 upstream In looking at network named pipe support on cifs, I noticed that Dave Howell's iget patch: iget: stop CIFS from using iget() and read_inode() broke mounts to IPC$ (the interprocess communication share), and don't handle the error case (when getting info on the root inode fails). Thanks to Gunter who noted a typo in a debug line in the original version of this patch. CC: David Howells <dhowells@redhat.com> CC: Gunter Kukkukk <linux@kukkukk.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06vfs: fix lookup on deleted directoryMiklos Szeredi
commit d70b67c8bc72ee23b55381bd6a884f4796692f77 upstream Lookup can install a child dentry for a deleted directory. This keeps the directory dentry alive, and the inode pinned in the cache and on disk, even after all external references have gone away. This isn't a big problem normally, since memory pressure or umount will clear out the directory dentry and its children, releasing the inode. But for UBIFS this causes problems because its orphan area can overflow. Fix this by returning ENOENT for all lookups on a S_DEAD directory before creating a child dentry. Thanks to Zoltan Sogor for noticing this while testing UBIFS, and Artem for the excellent analysis of the problem and testing. Reported-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06fat: detect media without partition table correctlyFrank Seidel
commit 0607fd02587a6b4b086dc746d63123c1f284db68 upstream I received a complaint that some FAT formated medias (e.g. sd memory cards) trigger a "unknown partition table" message even though there is no partition table and they work correctly, while in general (when e.g. formated with mkdosfs or even Windows Vista) this message is not shown. Currently this seems only to happen when the medias get formatted with Windows XP (and possibly Win 2000). Then the boot indicator byte contains garbage (part of text message) and so do the other parts checked by msdos_paritition which then later triggers this message. References: novell bug #364365 Most fat formatted media without partition table contains zeros in the boot indication and the other tested bytes and so falls through the checks in msdos_partition, leading it to return with 1 (all is fine). But some (e.g. WinXP formatted) fat fomated medias don't use boot_ind and so the check fails and causes a "unkown partition table" warning eventhough there is none and everything would be fine. This additional check directly verifies if there is a fat formatted medium without a partition table. Signed-off-by: Frank Seidel <fseidel@suse.de> Cc: Andreas Dilger <adilger@sun.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06FAT_VALID_MEDIA(): remove pointless testAndrew Morton
commit 73f20e58b1d586e9f6d3ddc3aad872829aca7743 upstream The on-disk media specification field in FAT is only 8-bits, so testing for <=0xff is pointless, and can generate a "comparison is always true due to limited range of data type" warning. While we're there, convert FAT_VALID_MEDIA() into a C function - the present implementation is buggy: it generates either one or two references to its argument. Cc: Frank Seidel <fseidel@suse.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06jbd: fix possible journal overflow issuesJosef Bacik
commit 5b9a499d77e9dd39c9e6611ea10c56a31604f274 upstream There are several cases where the running transaction can get buffers added to its BJ_Metadata list which it never dirtied, which makes its t_nr_buffers counter end up larger than its t_outstanding_credits counter. This will cause issues when starting new transactions as while we are logging buffers we decrement t_outstanding_buffers, so when t_outstanding_buffers goes negative, we will report that we need less space in the journal than we actually need, so transactions will be started even though there may not be enough room for them. In the worst case scenario (which admittedly is almost impossible to reproduce) this will result in the journal running out of space. The fix is to only refile buffers from the committing transaction to the running transactions BJ_Modified list when b_modified is set on that journal, which is the only way to be sure if the running transaction has modified that buffer. This patch also fixes an accounting error in journal_forget, it is possible that we can call journal_forget on a buffer without having modified it, only gotten write access to it, so instead of freeing a credit, we only do so if the buffer was modified. The assert will help catch if this problem occurs. Without these two patches I could hit this assert within minutes of running postmark, with them this issue no longer arises. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Acked-by: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06jbd: fix race between free buffer and commit transactionMingming Cao
commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91 upstream journal_try_to_free_buffers() could race with jbd commit transaction when the later is holding the buffer reference while waiting for the data buffer to flush to disk. If the caller of journal_try_to_free_buffers() request tries hard to release the buffers, it will treat the failure as error and return back to the caller. We have seen the directo IO failed due to this race. Some of the caller of releasepage() also expecting the buffer to be dropped when passed with GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers(). With this patch, if the caller is passing the __GFP_WAIT and __GFP_FS to indicating this call could wait, in case of try_to_free_buffers() failed, let's waiting for journal_commit_transaction() to finish commit the current committing transaction, then try to free those buffers again. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mingming Cao <cmm@us.ibm.com> Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06jbd: fix the way the b_modified flag is clearedJosef Bacik
commit 5bc833feaa8b2236265764e7e81f44937be46eda upstream Currently at the start of a journal commit we loop through all of the buffers on the committing transaction and clear the b_modified flag (the flag that is set when a transaction modifies the buffer) under the j_list_lock. The problem is that everywhere else this flag is modified only under the jbd lock buffer flag, so it will race with a running transaction who could potentially set it, and have it unset by the committing transaction. This is also a big waste, you can have several thousands of buffers that you are clearing the modified flag on when you may not need to. This patch removes this code and instead clears the b_modified flag upon entering do_get_write_access/journal_get_create_access, so if that transaction does indeed use the buffer then it will be accounted for properly, and if it does not then we know we didn't use it. That will be important for the next patch in this series. Tested thoroughly by myself using postmark/iozone/bonnie++. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Acked-by: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06NFS: Ensure we zap only the access and acl caches when setting new aclsTrond Myklebust
commit f41f741838480aeaa3a189cff6e210503cf9c42d upstream ...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the acls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06return to old errno choice in mkdir() et.al.Al Viro
commit e9baf6e59842285bcf9570f5094e4c27674a0f7c upstream In case when both EEXIST and EROFS would apply we used to return the former in mkdir(2) and friends. Lest anyone suspects us of being consistent, in the same situation knfsd gave clients nfs_erofs... ro-bind series had switched the syscall side of things to returning -EROFS and immediately broke an application - namely, mkdir -p. Patch restores the original behaviour... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-06romfs_readpage: don't report errors for pages beyond i_sizeLinus Torvalds
commit 0056e65f9e28d83ee1a3fb4f7d0041e838f03c34 upstream We zero-fill them like we are supposed to, and that's all fine. It's only an error if the 'romfs_copyfrom()' routine isn't able to fill the data that is supposed to be there. Most of the patch is really just re-organizing the code a bit, and using separate variables for the error value and for how much of the page we actually filled from the filesystem. Reported-and-tested-by: Chris Fester <cfester@wms.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Matt Waddel <matt.waddel@freescale.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01eCryptfs: use page_alloc not kmalloc to get a page of memoryEric Sandeen
commit 7fcba054373d5dfc43d26e243a5c9b92069972ee upstream Date: Mon, 28 Jul 2008 15:46:39 -0700 Subject: eCryptfs: use page_alloc not kmalloc to get a page of memory With SLUB debugging turned on in 2.6.26, I was getting memory corruption when testing eCryptfs. The root cause turned out to be that eCryptfs was doing kmalloc(PAGE_CACHE_SIZE); virt_to_page() and treating that as a nice page-aligned chunk of memory. But at least with SLUB debugging on, this is not always true, and the page we get from virt_to_page does not necessarily match the PAGE_CACHE_SIZE worth of memory we got from kmalloc. My simple testcase was 2 loops doing "rm -f fileX; cp /tmp/fileX ." for 2 different multi-megabyte files. With this change I no longer see the corruption. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01VFS: increase pseudo-filesystem block size to PAGE_SIZEAlex Nixon
commit 3971e1a917548977cff71418a7c3575ffbc9571f upstream This commit: commit ba52de123d454b57369f291348266d86f4b35070 Author: Theodore Ts'o <tytso@mit.edu> Date: Wed Sep 27 01:50:49 2006 -0700 [PATCH] inode-diet: Eliminate i_blksize from the inode structure caused the block size used by pseudo-filesystems to decrease from PAGE_SIZE to 1024 leading to a doubling of the number of context switches during a kernbench run. Signed-off-by: Alex Nixon <Alex.Nixon@citrix.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hugh@veritas.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01isofs: fix minor filesystem corruptionAdam Greenblatt
commit c0a1633b6201ef79e31b7da464d44fdf5953054d upstream Some iso9660 images contain files with rockridge data that is either incorrect or incompletely parsed. Prior to commit f2966632a134e865db3c819346a1dc7d96e05309 ("[PATCH] rock: handle directory overflows") (included with kernel 2.6.13) the kernel ignored the rockridge data for these files, while still allowing the files to be accessed under their non-rockridge names. That commit inadvertently changed things so that files with invalid rockridge data could not be accessed at all. (I ran across the problem when comparing some old CDs with hard disk copies I had made long ago under kernel 2.4: a few of the files on the hard disk copies were no longer visible on the CDs.) This change reverts to the pre-2.6.13 behavior. Signed-off-by: Adam Greenblatt <adam.greenblatt@gmail.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01quota: fix possible infinite loop in quota codeJan Kara
commit b48d380541f634663b71766005838edbb7261685 upstream When quota structure is going to be dropped and it is dirty, quota code tries to write it. If the write fails for some reason (e. g. transaction cannot be started because the journal is aborted), we try writing again and again and again... Fix the problem by clearing the dirty bit even if the write failed. (akpm: for 2.6.27, 2.6.26.x and 2.6.25.x) Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-24cifs: fix wksidarr declaration to be big-endian friendlyJeff Layton
commit 536abdb0802f3fac1b217530741853843d63c281 upstream The current definition of wksidarr works fine on little endian arches (since cpu_to_le32 is a no-op there), but on big-endian arches, it fails to compile with this error: error: braced-group within expression allowed only inside a function The problem is that this static declaration has cpu_to_le32 embedded within it, and that expands into a function macro. We need to use __constant_cpu_to_le32() instead. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-24exec: fix stack excutability without PT_GNU_STACKHugh Dickins
commit 96a8e13ed44e380fc2bb6c711d74d5ba698c00b2 upstream Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32) exec'ing an ELF without a PT_GNU_STACK program header should default to an executable stack; but this got broken by the unlimited argv feature because stack vma is now created before the right personality has been established: so breaking old binaries using nested function trampolines. Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack vm_flags used to be set, before the mprotect_fixup. Checking through our existing VM_flags, none would have changed since insert_vm_struct: so this seems safer than finding a way through the personality labyrinth. Reported-by: pageexec@freemail.hu Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-24reiserfs: discard prealloc in reiserfs_delete_inodeJeff Mahoney
commit eb35c218d83ec0780d9db869310f2e333f628702 upstream With the removal of struct file from the xattr code, reiserfs_file_release() isn't used anymore, so the prealloc isn't discarded. This causes hangs later down the line. This patch adds it to reiserfs_delete_inode. In most cases it will be a no-op due to it already having been called, but will avoid hangs with xattrs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-24block: Properly notify block layer of sync writesJens Axboe
commit 18ce3751ccd488c78d3827e9f6bf54e6322676fb upstream fsync_buffers_list() and sync_dirty_buffer() both issue async writes and then immediately wait on them. Conceptually, that makes them sync writes and we should treat them as such so that the IO schedulers can handle them appropriately. This patch fixes a write starvation issue that Lin Ming reported, where xx is stuck for more than 2 minutes because of a large number of synchronous IO in the system: INFO: task kjournald:20558 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kjournald D ffff810010820978 6712 20558 2 ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2 ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb 0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537 Call Trace: [<ffffffff803ba6f2>] kobject_get+0x12/0x17 [<ffffffff80247537>] getnstimeofday+0x2f/0x83 [<ffffffff8029c1ac>] sync_buffer+0x0/0x3f [<ffffffff8066d195>] io_schedule+0x5d/0x9f [<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f [<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f [<ffffffff8029c1ac>] sync_buffer+0x0/0x3f [<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff80243909>] wake_bit_function+0x0/0x23 [<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb [<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6 [<ffffffff8023a676>] lock_timer_base+0x26/0x4b [<ffffffff8030300a>] kjournald+0xc1/0x1fb [<ffffffff802438db>] autoremove_wake_function+0x0/0x2e [<ffffffff80302f49>] kjournald+0x0/0x1fb [<ffffffff802437bb>] kthread+0x47/0x74 [<ffffffff8022de51>] schedule_tail+0x28/0x5d [<ffffffff8020cac8>] child_rip+0xa/0x12 [<ffffffff80243774>] kthread+0x0/0x74 [<ffffffff8020cabe>] child_rip+0x0/0x12 Lin Ming confirms that this patch fixes the issue. I've run tests with it for the past week and no ill effects have been observed, so I'm proposing it for inclusion into 2.6.26. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-16cifs: fix oops on mount when CONFIG_CIFS_DFS_UPCALL is enabledMarcin Slusarz
simple "mount -t cifs //xxx /mnt" oopsed on strlen of options http://kerneloops.org/guilty.php?guilty=cifs_get_sb&version=2.6.25-release&start=16711 \ 68&end=1703935&class=oops Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-16ecryptfs: fix missed mutex_unlockCyrill Gorcunov
upstream commit: 71fd5179e8d1d4d503b517e0c5374f7c49540bfc Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16ecryptfs: clean up (un)lock_parentMiklos Szeredi
upstream commit: 8dc4e37362a5dc910d704d52ac6542bfd49ddc2f dget(dentry->d_parent) --> dget_parent(dentry) unlock_parent() is racy and unnecessary. Replace single caller with unlock_dir(). There are several other suspect uses of ->d_parent in ecryptfs... Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16eCryptfs: protect crypt_stat->flags in ecryptfs_open()Michael Halcrow
upstream commit: 2f9b12a31fcb738ea8c9eb0d4ddf906c6f1d696c Make sure crypt_stat->flags is protected with a lock in ecryptfs_open(). Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16ecryptfs: add missing lock around notify_changeMiklos Szeredi
upstream commit: 9c3580aa52195699065bc2d7242b1c7e3e6903fa Callers of notify_change() need to hold i_mutex. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16double-free of inode on alloc_file() failure exit in create_write_pipe()Al Viro
upstream commit: ed1524371716466e9c762808b02601d0d0276a92 Duh... Fortunately, the bug is quite recent (post-2.6.25) and, embarrassingly, mine ;-/ http://bugzilla.kernel.org/show_bug.cgi?id=10878 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09capabilities: remain source compatible with 32-bit raw legacy capability ↵Andrew G. Morgan
support. upstream commit: ca05a99a54db1db5bca72eccb5866d2a86f8517f Source code out there hard-codes a notion of what the _LINUX_CAPABILITY_VERSION #define means in terms of the semantics of the raw capability system calls capget() and capset(). Its unfortunate, but true. Since the confusing header file has been in a released kernel, there is software that is erroneously using 64-bit capabilities with the semantics of 32-bit compatibilities. These recently compiled programs may suffer corruption of their memory when sys_getcap() overwrites more memory than they are coded to expect, and the raising of added capabilities when using sys_capset(). As such, this patch does a number of things to clean up the situation for all. It 1. forces the _LINUX_CAPABILITY_VERSION define to always retain its legacy value. 2. adopts a new #define strategy for the kernel's internal implementation of the preferred magic. 3. deprecates v2 capability magic in favor of a new (v3) magic number. The functionality of v3 is entirely equivalent to v2, the only difference being that the v2 magic causes the kernel to log a "deprecated" warning so the admin can find applications that may be using v2 inappropriately. [User space code continues to be encouraged to use the libcap API which protects the application from details like this. libcap-2.10 is the first to support v3 capabilities.] Fixes issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=447518. Thanks to Bojan Smojver for the report. [akpm@linux-foundation.org: s/depreciate/deprecate/g] [akpm@linux-foundation.org: be robust about put_user size] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Bojan Smojver <bojan@rexursive.com> Cc: stable@kernel.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09eCryptfs: remove unnecessary page decrypt callMichael Halcrow
upstream commit: d3e49afbb66109613c3474f2273f5830ac2dcb09 The page decrypt calls in ecryptfs_write() are both pointless and buggy. Pointless because ecryptfs_get_locked_page() has already brought the page up to date, and buggy because prior mmap writes will just be blown away by the decrypt call. This patch also removes the declaration of a now-nonexistent function ecryptfs_write_zeros(). Thanks to Eric Sandeen and David Kleikamp for helping to track this down. Eric said: fsx w/ mmap dies quickly ( < 100 ops) without this, and survives nicely (to millions of ops+) with it in place. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [chrisw: backport to 2.6.25.5] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09pagemap: fix bug in add_to_pagemap, require aligned-length reads of ↵Thomas Tuttle
/proc/pid/pagemap upstream commit: aae8679b0ebcaa92f99c1c3cb0cd651594a43915 Fix a bug in add_to_pagemap. Previously, since pm->out was a char *, put_user was only copying 1 byte of every PFN, resulting in the top 7 bytes of each PFN not being copied. By requiring that reads be a multiple of 8 bytes, I can make pm->out and pm->end u64*s instead of char*s, which makes put_user work properly, and also simplifies the logic in add_to_pagemap a bit. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Thomas Tuttle <ttuttle@google.com> Cc: Matt Mackall <mpm@selenic.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09proc: calculate the correct /proc/<pid> link countVegard Nossum
upstream commit: aed5417593ad125283f35513573282139a8664b5 This patch: commit e9720acd728a46cb40daa52c99a979f7c4ff195c Author: Pavel Emelyanov <xemul@openvz.org> Date: Fri Mar 7 11:08:40 2008 -0800 [NET]: Make /proc/net a symlink on /proc/self/net (v3) introduced a /proc/self/net directory without bumping the corresponding link count for /proc/self. This patch replaces the static link count initializations with a call that counts the number of directory entries in the given pid_entry table whenever it is instantiated, and thus relieves the burden of manually keeping the two in sync. [akpm@linux-foundation.org: cleanup] Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09XFS: Fix memory corruption with small buffer readsChristoph Hellwig
upstream commit: 6ab455eeaff6893cd06da33843e840d888cdc04a When we have multiple buffers in a single page for a blocksize == pagesize filesystem we might overwrite the page contents if two callers hit it shortly after each other. To prevent that we need to keep the page locked until I/O is completed and the page marked uptodate. Thanks to Eric Sandeen for triaging this bug and finding a reproducible testcase and Dave Chinner for additional advice. This should fix kernel.org bz #10421. Tested-by: Eric Sandeen <sandeen@sandeen.net> SGI-PV: 981813 SGI-Modid: xfs-linux-melb:xfs-kern:31173a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09CIFS: Fix UNC path prefix on QueryUnixPathInfo to have correct slashSteve French
upstream commit: 076d8423a98659a92837b07aa494cb74bfefe77c When a share was in DFS and the server was Unix/Linux, we were sending paths of the form \\server\share/dir/file rather than //server/share/dir/file There was some discussion between me and jra over whether we should use /server/share/dir/file as MS sometimes says - but the documentation for this claims it should be doubleslash for this type of UNC-like path format and that works, so leaving it as doubleslash but converting the \ to / in the the //server/share portion. This gets Samba to now correctly return STATUS_PATH_NOT_COVERED when it is supposed to (Windows already did since the direction of the slash was not an issue for them). Still need another minor change to fully enable DFS (need to finish some chages to SMBGetDFSRefer Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-09ext3/4: fix uninitialized bs in ext3/4_xattr_set_handle()Tiger Yang
commit 7e01c8e5420b6c7f9d85d34c15d8c7a15c9fc720 upstream This fix the uninitialized bs when we try to replace a xattr entry in ibody with the new value which require more than free space. This situation only happens we format ext3/4 with inode size more than 128 and we have put xattr entries both in ibody and block. The consequences about this bug is we will lost the xattr block which pointed by i_file_acl with all xattr entires in it. We will alloc a new xattr block and put that large value entry in it. The old xattr block will become orphan block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Cc: <linux-ext4@vger.kernel.org> Cc: Andreas Gruenbacher <agruen@suse.de> Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-06asn1: additional sanity checking during BER decoding (CVE-2008-1673)Chris Wright
upstream commit: ddb2c43594f22843e9f3153da151deaba1a834c5 - Don't trust a length which is greater than the working buffer. An invalid length could cause overflow when calculating buffer size for decoding oid. - An oid length of zero is invalid and allows for an off-by-one error when decoding oid because the first subid actually encodes first 2 subids. - A primitive encoding may not have an indefinite length. Thanks to Wei Wang from McAfee for report. Cc: Steven French <sfrench@us.ibm.com> Cc: stable@kernel.org Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-09reiserfs: Unpack tails on quota filesJan Kara
commit d5dee5c395062a55236318ac4eec1f4ebb9de6db upstream Quota files cannot have tails because quota_write and quota_read functions do not support them. So far when quota files did have tail, we just refused to turn quotas on it. Sadly this check has been wrong and so there are now plenty installations where quota files don't have NOTAIL flag set and so now after fixing the check, they suddently fail to turn quotas on. Since it's easy to unpack the tail from kernel, do this from reiserfs_quota_on() which solves the problem and is generally nicer to users anyway. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: <urhausen@urifabi.net> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-09vfs: fix permission checking in sys_utimensatMiklos Szeredi
commit: 02c6be615f1fcd37ac5ed93a3ad6692ad8991cd9 upstream If utimensat() is called with both times set to UTIME_NOW or one of them to UTIME_NOW and the other to UTIME_OMIT, then it will update the file time without any permission checking. I don't think this can be used for anything other than a local DoS, but could be quite bewildering at that (e.g. "Why was that large source tree rebuilt when I didn't modify anything???") This affects all kernels from 2.6.22, when the utimensat() syscall was introduced. Fix by doing the same permission checking as for the "times == NULL" case. Thanks to Michael Kerrisk, whose utimensat-non-conformances-and-fixes.patch in -mm also fixes this (and breaks other stuff), only he didn't realize the security implications of this bug. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-06fix SMP ordering hole in fcntl_setlk() (CVE-2008-1669)Al Viro
commit 0b2bac2f1ea0d33a3621b27ca68b9ae760fca2e9 upstream. fcntl_setlk()/close() race prevention has a subtle hole - we need to make sure that if we *do* have an fcntl/close race on SMP box, the access to descriptor table and inode->i_flock won't get reordered. As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs. STORE descriptor table entry, LOAD inode->i_flock with not a single lock in common on both sides. We do have BKL around the first STORE, but check in locks_remove_posix() is outside of BKL and for a good reason - we don't want BKL on common path of close(2). Solution is to hold ->file_lock around fcheck() in there; that orders us wrt removal from descriptor table that preceded locks_remove_posix() on close path and we either come first (in which case eviction will be handled by the close side) or we'll see the effect of close and do eviction ourselves. Note that even though it's read-only access, we do need ->file_lock here - rcu_read_lock() won't be enough to order the things. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01Fix dnotify/close race (CVE-2008-1375)Al Viro
commit 214b7049a7929f03bbd2786aaef04b8b79db34e2 upstream. We have a race between fcntl() and close() that can lead to dnotify_struct inserted into inode's list *after* the last descriptor had been gone from current->files. Since that's the only point where dnotify_struct gets evicted, we are screwed - it will stick around indefinitely. Even after struct file in question is gone and freed. Worse, we can trigger send_sigio() on it at any later point, which allows to send an arbitrary signal to arbitrary process if we manage to apply enough memory pressure to get the page that used to host that struct file and fill it with the right pattern... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01aio: io_getevents() should return if io_destroy() is invokedJeff Moyer
commit e92adcba261fd391591bb63c1703185a04a41554 upstream This patch wakes up a thread waiting in io_getevents if another thread destroys the context. This was tested using a small program that spawns a thread to wait in io_getevents while the parent thread destroys the io context and then waits for the getevents thread to exit. Without this patch, the program hangs indefinitely. With the patch, the program exits as expected. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Christopher Smith <x@xman.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01JFFS2: Fix free space leak with in-band cleanmarkersDavid Woodhouse
We were accounting for the cleanmarker by calling jffs2_link_node_ref() (without locking!), which adjusted both superblock and per-eraseblock accounting, subtracting the size of the cleanmarker from {jeb,c}->free_size and adding it to {jeb,c}->used_size. But only _then_ were we adding the size of the newly-erased block back to the superblock counts, and we were adding each of jeb->{free,used}_size to the corresponding superblock counts. Thus, the size of the cleanmarker was effectively subtracted from the superblock's free_size _twice_. Fix this, by always adding a full eraseblock size to c->free_size when we've erased a block. And call jffs2_link_node_ref() under the proper lock, while we're at it. Thanks to Alexander Yurchenko and/or Damir Shayhutdinov for (almost) pinpointing the problem. [Backport of commit 014b164e1392a166fe96e003d2f0e7ad2e2a0bb7] Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-16AFS: Do not describe debug parameters with their valuePaul Bolle
Describe debug parameters with their names (and not their values). Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrsJan Kara
mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But filesystems are calling this function while holding xattr_sem so possible recursion into the fs violates locking ordering of xattr_sem and transaction start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems can specify desired gfp mask and use GFP_NOFS from all of them. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Dave Jones <davej@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-14JFFS2 Fix of panics caused by wrong condition for hole frag creation in ↵Alexey Korolev
write_begin This fixes a regression introduced in commit 205c109a7a96d9a3d8ffe64c4068b70811fef5e8 when switching to write_begin/write_end operations in JFFS2. The page offset is miscalculated, leading to corruption of the fragment lists and subsequently to memory corruption and panics. [ Side note: the bug is a fairly direct result of the naming. Nick was likely misled by the use of "offs", since we tend to use the notion of "offset" not as an absolute position, but as an offset _within_ a page or allocation. Alternatively, a "pgoff_t" is a page index, but not a byte offset - our VM naming can be a bit confusing. So in this case, a VM person would likely have called this a "pos", not an "offs", or perhaps talked about byte offsets rather than page offsets (since it's counted in bytes, not pages). - Linus ] Signed-off-by: Alexey Korolev <akorolev@infradead.org> Signed-off-by: Vasiliy Leonenko <vasiliy.leonenko@mail.ru> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-14locks: fix possible infinite loop in fcntl(F_SETLKW) over nfsJ. Bruce Fields
Miklos Szeredi found the bug: "Basically what happens is that on the server nlm_fopen() calls nfsd_open() which returns -EACCES, to which nlm_fopen() returns NLM_LCK_DENIED. "On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), which in will cause fcntl_setlk() to retry forever." So, for example, opening a file on an nfs filesystem, changing permissions to forbid further access, then trying to lock the file, could result in an infinite loop. And Trond Myklebust identified the culprit, from Marc Eshel and I: 7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out generic/filesystem switch from setlock code" That commit claimed to just be reshuffling code, but actually introduced a behavioral change by calling the lock method repeatedly as long as it returned -EAGAIN. We assumed this would be safe, since we assumed a lock of type SETLKW would only return with either success or an error other than -EAGAIN. However, nfs does can in fact return -EAGAIN in this situation, and independently of whether that behavior is correct or not, we don't actually need this change, and it seems far safer not to depend on such assumptions about the filesystem's ->lock method. Therefore, revert the problematic part of the original commit. This leaves vfs_lock_file() and its other callers unchanged, while returning fcntl_setlk and fcntl_setlk64 to their former behavior. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Tested-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-11Merge branch 'docs' of git://git.lwn.net/linux-2.6Linus Torvalds
* 'docs' of git://git.lwn.net/linux-2.6: Add additional examples in Documentation/spinlocks.txt Move sched-rt-group.txt to scheduler/ Documentation: move rpc-cache.txt to filesystems/ Documentation: move nfsroot.txt to filesystems/ Spell out behavior of atomic_dec_and_lock() in kerneldoc Fix a typo in highres.txt Fixes to the seq_file document Fill out information on patch tags in SubmittingPatches Add the seq_file documentation
2008-04-11Documentation: move nfsroot.txt to filesystems/J. Bruce Fields
Documentation/ is a little large, and filesystems/ seems an obvious place for this file. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-04-11signalfd: fix for incorrect SI_QUEUE user data reportingDavide Libenzi
Michael Kerrisk found out that signalfd was not reporting back user data pushed using sigqueue: http://groups.google.com/group/linux.kernel/msg/9397cab8551e3123 The following patch makes signalfd report back the ssi_ptr and ssi_int members of the signalfd_siginfo structure. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Acked-by: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>