summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-04-22proc: do proper range check on readdir offsetLinus Torvalds
commit d8bdc59f215e62098bc5b4256fd9928bf27053a1 upstream. Rather than pass in some random truncated offset to the pid-related functions, check that the offset is in range up-front. This is just cleanup, the previous commit fixed the real problem. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22UBIFS: fix oops when R/O file-system is fsync'edArtem Bityutskiy
commit 78530bf7f2559b317c04991b52217c1608d5a58d upstream. This patch fixes severe UBIFS bug: UBIFS oopses when we 'fsync()' an file on R/O-mounter file-system. We (the UBIFS authors) incorrectly thought that VFS would not propagate 'fsync()' down to the file-system if it is read-only, but this is not the case. It is easy to exploit this bug using the following simple perl script: use strict; use File::Sync qw(fsync sync); die "File path is not specified" if not defined $ARGV[0]; my $path = $ARGV[0]; open FILE, "<", "$path" or die "Cannot open $path: $!"; fsync(\*FILE) or die "cannot fsync $path: $!"; close FILE or die "Cannot close $path: $!"; Thanks to Reuben Dowle <Reuben.Dowle@navico.com> for reporting about this issue. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reported-by: Reuben Dowle <Reuben.Dowle@navico.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22ramfs: fix memleak on no-mmu archBob Liu
commit b836aec53e2bce71de1d5415313380688c851477 upstream. On no-mmu arch, there is a memleak during shmem test. The cause of this memleak is ramfs_nommu_expand_for_mapping() added page refcount to 2 which makes iput() can't free that pages. The simple test file is like this: int main(void) { int i; key_t k = ftok("/etc", 42); for ( i=0; i<100; ++i) { int id = shmget(k, 10000, 0644|IPC_CREAT); if (id == -1) { printf("shmget error\n"); } if(shmctl(id, IPC_RMID, NULL ) == -1) { printf("shm rm error\n"); return -1; } } printf("run ok...\n"); return 0; } And the result: root:/> free total used free shared buffers Mem: 60320 17912 42408 0 0 -/+ buffers: 17912 42408 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 19096 41224 0 0 -/+ buffers: 19096 41224 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 20296 40024 0 0 -/+ buffers: 20296 40024 ... After this patch the test result is:(no memleak anymore) root:/> free total used free shared buffers Mem: 60320 16668 43652 0 0 -/+ buffers: 16668 43652 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 16668 43652 0 0 -/+ buffers: 16668 43652 Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22UBIFS: restrict world-writable debugfs filesVasiliy Kulikov
commit 8c559d30b4e59cf6994215ada1fe744928f494bf upstream. Don't allow everybody to dump sensitive information about filesystems. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22cifs: always do is_path_accessible check in cifs_mountJeff Layton
commit 70945643722ffeac779d2529a348f99567fa5c33 upstream. Currently, we skip doing the is_path_accessible check in cifs_mount if there is no prefixpath. I have a report of at least one server however that allows a TREE_CONNECT to a share that has a DFS referral at its root. The reporter in this case was using a UNC that had no prefixpath, so the is_path_accessible check was not triggered and the box later hit a BUG() because we were chasing a DFS referral on the root dentry for the mount. This patch fixes this by removing the check for a zero-length prefixpath. That should make the is_path_accessible check be done in this situation and should allow the client to chase the DFS referral at mount time instead. Reported-and-Tested-by: Yogesh Sharma <ysharma@cymer.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14xfs: zero proper structure size for geometry callsAlex Elder
commit af24ee9ea8d532e16883251a6684dfa1be8eec29 upstream. Commit 493f3358cb289ccf716c5a14fa5bb52ab75943e5 added this call to xfs_fs_geometry() in order to avoid passing kernel stack data back to user space: + memset(geo, 0, sizeof(*geo)); Unfortunately, one of the callers of that function passes the address of a smaller data type, cast to fit the type that xfs_fs_geometry() requires. As a result, this can happen: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: f87aca93 Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1 Call Trace: [<c12991ac>] ? panic+0x50/0x150 [<c102ed71>] ? __stack_chk_fail+0x10/0x18 [<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs] Fix this by fixing that one caller to pass the right type and then copy out the subset it is interested in. Note: This patch is an alternative to one originally proposed by Eric Sandeen. Reported-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14exec: copy-and-paste the fixes into compat_do_execve() pathsOleg Nesterov
commit 114279be2120a916e8a04feeb2ac976a10016f2f upstream. Note: this patch targets 2.6.37 and tries to be as simple as possible. That is why it adds more copy-and-paste horror into fs/compat.c and uglifies fs/exec.c, this will be cleanuped later. compat_copy_strings() plays with bprm->vma/mm directly and thus has two problems: it lacks the RLIMIT_STACK check and argv/envp memory is not visible to oom killer. Export acct_arg_size() and get_arg_page(), change compat_copy_strings() to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0) as do_execve() does. Add the fatal_signal_pending/cond_resched checks into compat_count() and compat_copy_strings(), this matches the code in fs/exec.c and certainly makes sense. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14exec: make argv/envp memory visible to oom-killerOleg Nesterov
commit 3c77f845722158206a7209c45ccddc264d19319c upstream. Brad Spengler published a local memory-allocation DoS that evades the OOM-killer (though not the virtual memory RLIMIT): http://www.grsecurity.net/~spender/64bit_dos.c execve()->copy_strings() can allocate a lot of memory, but this is not visible to oom-killer, nobody can see the nascent bprm->mm and take it into account. With this patch get_arg_page() increments current's MM_ANONPAGES counter every time we allocate the new page for argv/envp. When do_execve() succeds or fails, we change this counter back. Technically this is not 100% correct, we can't know if the new page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but I don't think this really matters and everything becomes correct once exec changes ->mm or fails. Reported-by: Brad Spengler <spender@grsecurity.net> Reviewed-and-discussed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14nfsd: fix auth_domain reference leak on nlm operationsJ. Bruce Fields
commit 954032d2527f2fce7355ba70709b5e143d6b686f upstream. This was noticed by users who performed more than 2^32 lock operations and hence made this counter overflow (eventually leading to use-after-free's). Setting rq_client to NULL here means that it won't later get auth_domain_put() when it should be. Appears to have been introduced in 2.5.42 by "[PATCH] kNFSd: Move auth domain lookup into svcauth" which moved most of the rq_client handling to common svcauth code, but left behind this one line. Cc: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14ext4: fix credits computing for indirect mapped filesYongqiang Yang
commit 5b41395fcc0265fc9f193aef9df39ce49d64677c upstream. When writing a contiguous set of blocks, two indirect blocks could be needed depending on how the blocks are aligned, so we need to increase the number of credits needed by one. [ Also fixed a another bug which could further underestimate the number of journal credits needed by 1; the code was using integer division instead of DIV_ROUND_UP() -- tytso] Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14Squashfs: handle corruption of directory structurePhillip Lougher
commit 44cff8a9ee8a974f9e931df910688e7fc1f0b0f9 upstream. Handle the rare case where a directory metadata block is uncompressed and corrupted, leading to a kernel oops in directory scanning (memcpy). Normally corruption is detected at the decompression stage and dealt with then, however, this will not happen if: - metadata isn't compressed (users can optionally request no metadata compression), or - the compressed metadata block was larger than the original, in which case the uncompressed version was used, or - the data was corrupt after decompression This patch fixes this by adding some sanity checks against known maximum values. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14Treat writes as new when holes span across page boundariesGoldwyn Rodrigues
commit 272b62c1f0f6f742046e45b50b6fec98860208a0 upstream. When a hole spans across page boundaries, the next write forces a read of the block. This could end up reading existing garbage data from the disk in ocfs2_map_page_blocks. This leads to non-zero holes. In order to avoid this, mark the writes as new when the holes span across page boundaries. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: jlbec <jlbec@evilplan.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14quota: Don't write quota info in dquot_commit()Jan Kara
commit b03f24567ce7caf2420b8be4c6eb74c191d59a91 upstream. There's no reason to write quota info in dquot_commit(). The writing is a relict from the old days when we didn't have dquot_acquire() and dquot_release() and thus dquot_commit() could have created / removed quota structures from the file. These days dquot_commit() only updates usage counters / limits in quota structure and thus there's no need to write quota info. This also fixes an issue with journaling filesystem which didn't reserve enough space in the transaction for write of quota info (it could have been dirty at the time of dquot_commit() because of a race with other operation changing it). Reported-and-tested-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14UBIFS: fix debugging failure in dbg_check_space_infoArtem Bityutskiy
commit 7da6443aca9be29c6948dcbd636ad50154d0bc0c upstream. This patch fixes a debugging failure with which looks like this: UBIFS error (pid 32313): dbg_check_space_info: free space changed from 6019344 to 6022654 The reason for this failure is described in the comment this patch adds to the code. But in short - 'c->freeable_cnt' may be different before and after re-mounting, and this is normal. So the debugging code should make sure that free space calculations do not depend on 'c->freeable_cnt'. A similar issue has been reported here: http://lists.infradead.org/pipermail/linux-mtd/2011-April/034647.html This patch should fix it. For the -stable guys: this patch is only relevant for kernels 2.6.30 onwards. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14UBIFS: fix oops on error path in read_pnodeArtem Bityutskiy
commit 54acbaaa523ca0bd284a18f67ad213c379679e86 upstream. Thanks to coverity which spotted that UBIFS will oops if 'kmalloc()' in 'read_pnode()' fails and we dereference a NULL 'pnode' pointer when we 'goto out'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14UBIFS: do not read flash unnecessarilyArtem Bityutskiy
commit 8b229c76765816796eec7ccd428f03bd8de8b525 upstream. This fix makes the 'dbg_check_old_index()' function return immediately if debugging is disabled, instead of executing incorrect 'goto out' which causes UBIFS to: 1. Allocate memory 2. Read the flash On every commit. OK, we do not commit that often, but it is still silly to do unneeded I/O anyway. Credits to coverity for spotting this silly issue. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14Btrfs: Fix uninitialized root flags for subvolumesLi Zefan
commit 08fe4db170b4193603d9d31f40ebaf652d07ac9c upstream. root_item->flags and root_item->byte_limit are not initialized when a subvolume is created. This bug is not revealed until we added readonly snapshot support - now you mount a btrfs filesystem and you may find the subvolumes in it are readonly. To work around this problem, we steal a bit from root_item->inode_item->flags, and use it to indicate if those fields have been properly initialized. When we read a tree root from disk, we check if the bit is set, and if not we'll set the flag and initialize the two fields of the root item. Reported-by: Andreas Philipp <philipp.andreas@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Tested-by: Andreas Philipp <philipp.andreas@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14nilfs2: fix data loss in mmap page write for hole blocksRyusuke Konishi
commit 34094537943113467faee98fe67c8a3d3f9a0a8b upstream. From the result of a function test of mmap, mmap write to shared pages turned out to be broken for hole blocks. It doesn't write out filled blocks and the data will be lost after umount. This is due to a bug that the target file is not queued for log writer when filling hole blocks. Also, nilfs_page_mkwrite function exits normal code path even after successfully filled hole blocks due to a change of block_page_mkwrite function; just after nilfs was merged into the mainline, block_page_mkwrite() started to return VM_FAULT_LOCKED instead of zero by the patch "mm: close page_mkwrite races" (commit: b827e496c893de0c). The current nilfs_page_mkwrite() is not handling this value properly. This corrects nilfs_page_mkwrite() and will resolve the data loss problem in mmap write. [This should be applied to every kernel since 2.6.30 but a fix is needed for 2.6.37 and prior kernels] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1Dan Rosenberg
commit c4d0c3b097f7584772316ee4d64a09fe0e4ddfca upstream. The FSGEOMETRY_V1 ioctl (and its compat equivalent) calls out to xfs_fs_geometry() with a version number of 3. This code path does not fill in the logsunit member of the passed xfs_fsop_geom_t, leading to the leaking of four bytes of uninitialized stack data to potentially unprivileged callers. v2 switches to memset() to avoid future issues if structure members change, on suggestion of Dave Chinner. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Reviewed-by: Eugene Teo <eugeneteo@kernel.org> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14eCryptfs: ecryptfs_keyring_auth_tok_for_sig() bug fixRoberto Sassu
commit 1821df040ac3cd6a57518739f345da6d50ea9d3f upstream. The pointer '(*auth_tok_key)' is set to NULL in case request_key() fails, in order to prevent its use by functions calling ecryptfs_keyring_auth_tok_for_sig(). Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14eCryptfs: Unlock page in write_begin error pathTyler Hicks
commit 50f198ae16ac66508d4b8d5a40967a8507ad19ee upstream. Unlock the page in error path of ecryptfs_write_begin(). This may happen, for example, if decryption fails while bring the page up-to-date. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27fs: call security_d_instantiate in d_obtain_alias V2Josef Bacik
commit 24ff6663ccfdaf088dfa7acae489cb11ed4f43c4 upstream. While trying to track down some NFS problems with BTRFS, I kept noticing I was getting -EACCESS for no apparent reason. Eric Paris and printk() helped me figure out that it was SELinux that was giving me grief, with the following denial type=AVC msg=audit(1290013638.413:95): avc: denied { 0x800000 } for pid=1772 comm="nfsd" name="" dev=sda1 ino=256 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file Turns out this is because in d_obtain_alias if we can't find an alias we create one and do all the normal instantiation stuff, but we don't do the security_d_instantiate. Usually we are protected from getting a hashed dentry that hasn't yet run security_d_instantiate() by the parent's i_mutex, but obviously this isn't an option there, so in order to deal with the case that a second thread comes in and finds our new dentry before we get to run security_d_instantiate(), we go ahead and call it if we find a dentry already. Eric assures me that this is ok as the code checks to see if the dentry has been initialized already so calling security_d_instantiate() against the same dentry multiple times is ok. With this patch I'm no longer getting errant -EACCESS values. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27nfsd: wrong index used in inner loopMi Jinlong
commit 5a02ab7c3c4580f94d13c683721039855b67cda6 upstream. We must not use dummy for index. After the first index, READ32(dummy) will change dummy!!!! Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> [bfields@redhat.com: Trond points out READ_BUF alone is sufficient.] Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27nfsd41: modify the members value of nfsd4_op_flagsMi Jinlong
commit 5ece3cafbd88d4da5c734e1810c4a2e6474b57b2 upstream. The members of nfsd4_op_flags, (ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS) equals to ALLOWED_AS_FIRST_OP, maybe that's not what we want. OP_PUTROOTFH with op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS, can't appears as the first operation with out SEQUENCE ops. This patch modify the wrong value of ALLOWED_WITHOUT_FH etc which was introduced by f9bb94c4. Reviewed-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27proc: protect mm start_code/end_code in /proc/pid/statKees Cook
commit 5883f57ca0008ffc93e09cbb9847a1928e50c6f3 upstream. While mm->start_stack was protected from cross-uid viewing (commit f83ce3e6b02d5 ("proc: avoid information leaks to non-privileged processes")), the start_code and end_code values were not. This would allow the text location of a PIE binary to leak, defeating ASLR. Note that the value "1" is used instead of "0" for a protected value since "ps", "killall", and likely other readers of /proc/pid/stat, take start_code of "0" to mean a kernel thread and will misbehave. Thanks to Brad Spengler for pointing this out. Addresses CVE-2011-0726 Signed-off-by: Kees Cook <kees.cook@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27procfs: fix /proc/<pid>/maps heap checkAaro Koskinen
commit 0db0c01b53a1a421513f91573241aabafb87802a upstream. The current code fails to print the "[heap]" marking if the heap is split into multiple mappings. Fix the check so that the marking is displayed in all possible cases: 1. vma matches exactly the heap 2. the heap vma is merged e.g. with bss 3. the heap vma is splitted e.g. due to locked pages Test cases. In all cases, the process should have mapping(s) with [heap] marking: (1) vma matches exactly the heap #include <stdio.h> #include <unistd.h> #include <sys/types.h> int main (void) { if (sbrk(4096) != (void *)-1) { printf("check /proc/%d/maps\n", (int)getpid()); while (1) sleep(1); } return 0; } # ./test1 check /proc/553/maps [1] + Stopped ./test1 # cat /proc/553/maps | head -4 00008000-00009000 r-xp 00000000 01:00 3113640 /test1 00010000-00011000 rw-p 00000000 01:00 3113640 /test1 00011000-00012000 rw-p 00000000 00:00 0 [heap] 4006f000-40070000 rw-p 00000000 00:00 0 (2) the heap vma is merged #include <stdio.h> #include <unistd.h> #include <sys/types.h> char foo[4096] = "foo"; char bar[4096]; int main (void) { if (sbrk(4096) != (void *)-1) { printf("check /proc/%d/maps\n", (int)getpid()); while (1) sleep(1); } return 0; } # ./test2 check /proc/556/maps [2] + Stopped ./test2 # cat /proc/556/maps | head -4 00008000-00009000 r-xp 00000000 01:00 3116312 /test2 00010000-00012000 rw-p 00000000 01:00 3116312 /test2 00012000-00014000 rw-p 00000000 00:00 0 [heap] 4004a000-4004b000 rw-p 00000000 00:00 0 (3) the heap vma is splitted (this fails without the patch) #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #include <sys/types.h> int main (void) { if ((sbrk(4096) != (void *)-1) && !mlockall(MCL_FUTURE) && (sbrk(4096) != (void *)-1)) { printf("check /proc/%d/maps\n", (int)getpid()); while (1) sleep(1); } return 0; } # ./test3 check /proc/559/maps [1] + Stopped ./test3 # cat /proc/559/maps|head -4 00008000-00009000 r-xp 00000000 01:00 3119108 /test3 00010000-00011000 rw-p 00000000 01:00 3119108 /test3 00011000-00012000 rw-p 00000000 00:00 0 [heap] 00012000-00013000 rw-p 00000000 00:00 0 [heap] It looks like the bug has been there forever, and since it only results in some information missing from a procfile, it does not fulfil the -stable "critical issue" criteria. Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27ext3: skip orphan cleanup on rocompat fsAmir Goldstein
commit ce654b37f87980d95f339080e4c3bdb2370bdf22 upstream. Orphan cleanup is currently executed even if the file system has some number of unknown ROCOMPAT features, which deletes inodes and frees blocks, which could be very bad for some RO_COMPAT features. This patch skips the orphan cleanup if it contains readonly compatible features not known by this ext3 implementation, which would prevent the fs from being mounted (or remounted) readwrite. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27aio: wake all waiters when destroying ctxRoland Dreier
commit e91f90bb0bb10be9cc8efd09a3cf4ecffcad0db1 upstream. The test program below will hang because io_getevents() uses add_wait_queue_exclusive(), which means the wake_up() in io_destroy() only wakes up one of the threads. Fix this by using wake_up_all() in the aio code paths where we want to make sure no one gets stuck. // t.c -- compile with gcc -lpthread -laio t.c #include <libaio.h> #include <pthread.h> #include <stdio.h> #include <unistd.h> static const int nthr = 2; void *getev(void *ctx) { struct io_event ev; io_getevents(ctx, 1, 1, &ev, NULL); printf("io_getevents returned\n"); return NULL; } int main(int argc, char *argv[]) { io_context_t ctx = 0; pthread_t thread[nthr]; int i; io_setup(1024, &ctx); for (i = 0; i < nthr; ++i) pthread_create(&thread[i], NULL, getev, ctx); sleep(1); io_destroy(ctx); for (i = 0; i < nthr; ++i) pthread_join(thread[i], NULL); return 0; } Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23ext3: Always set dx_node's fake_dirent explicitly.Eric Sandeen
commit d7433142b63d727b5a217c37b1a1468b116a9771 upstream. (crossport of 1f7bebb9e911d870fa8f997ddff838e82b5715ea by Andreas Schlick <schlick@lavabit.com>) When ext3_dx_add_entry() has to split an index node, it has to ensure that name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck won't recognise it as an intermediate htree node and consider the htree to be corrupted. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-14nfsd: wrong index used in inner looproel
commit 3ec07aa9522e3d5e9d5ede7bef946756e623a0a0 upstream. Index i was already used in the outer loop Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-07CIFS: Fix oplock break handling (try #2)Pavel Shilovsky
commit 12fed00de963433128b5366a21a55808fab2f756 upstream. When we get oplock break notification we should set the appropriate value of OplockLevel field in oplock break acknowledge according to the oplock level held by the client in this time. As we only can have level II oplock or no oplock in the case of oplock break, we should be aware only about clientCanCacheRead field in cifsInodeInfo structure. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-07ext2: Fix link count corruption under heavy link+rename loadJosh Hunt
commit e8a80c6f769dd4622d8b211b398452158ee60c0b upstream. vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt it as reported and analyzed by Josh. In fact, there is no good reason to mess with i_nlink of the moved file. We did it presumably to simulate linking into the new directory and unlinking from an old one. But the practical effect of this is disputable because fsck can possibly treat file as being properly linked into both directories without writing any error which is confusing. So we just stop increment-decrement games with i_nlink which also fixes the corruption. CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-07fuse: fix hang of single threaded fuseblk filesystemMiklos Szeredi
commit 5a18ec176c934ca1bc9dc61580a5e0e90a9b5733 upstream. Single threaded NTFS-3G could get stuck if a delayed RELEASE reply triggered a DESTROY request via path_put(). Fix this by a) making RELEASE requests synchronous, whenever possible, on fuseblk filesystems b) if not possible (triggered by an asynchronous read/write) then do the path_put() in a separate thread with schedule_work(). Reported-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-07Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a ↵Tristan Ye
right number. commit acf3bb007e5636ef4c17505affb0974175108553 upstream. Current refcounttree codes actually didn't writeback the new pages out in write-back mode, due to a bug of always passing a ZERO number of clusters to 'ocfs2_cow_sync_writeback', the patch tries to pass a proper one in. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02ldm: corrupted partition table can cause kernel oopsTimo Warns
commit 294f6cf48666825d23c9372ef37631232746e40d upstream. The kernel automatically evaluates partition tables of storage devices. The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains a bug that causes a kernel oops on certain corrupted LDM partitions. A kernel subsystem seems to crash, because, after the oops, the kernel no longer recognizes newly connected storage devices. The patch changes ldm_parse_vmdb() to Validate the value of vblk_size. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Acked-by: Richard Russon <ldm@flatcap.org> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02epoll: prevent creating circular epoll structuresDavide Libenzi
commit 22bacca48a1755f79b7e0f192ddb9fbb7fc6e64e upstream. In several places, an epoll fd can call another file's ->f_op->poll() method with ep->mtx held. This is in general unsafe, because that other file could itself be an epoll fd that contains the original epoll fd. The code defends against this possibility in its own ->poll() method using ep_call_nested, but there are several other unsafe calls to ->poll elsewhere that can be made to deadlock. For example, the following simple program causes the call in ep_insert recursively call the original fd's ->poll, leading to deadlock: #include <unistd.h> #include <sys/epoll.h> int main(void) { int e1, e2, p[2]; struct epoll_event evt = { .events = EPOLLIN }; e1 = epoll_create(1); e2 = epoll_create(2); pipe(p); epoll_ctl(e2, EPOLL_CTL_ADD, e1, &evt); epoll_ctl(e1, EPOLL_CTL_ADD, p[0], &evt); write(p[1], p, sizeof p); epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt); return 0; } On insertion, check whether the inserted file is itself a struct epoll, and if so, do a recursive walk to detect whether inserting this file would create a loop of epoll structures, which could lead to deadlock. [nelhage@ksplice.com: Use epmutex to serialize concurrent inserts] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Reported-by: Nelson Elhage <nelhage@ksplice.com> Tested-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02eCryptfs: Copy up lower inode attrs in getattrTyler Hicks
commit 55f9cf6bbaa682958a7dd2755f883b768270c3ce upstream. The lower filesystem may do some type of inode revalidation during a getattr call. eCryptfs should take advantage of that by copying the lower inode attributes to the eCryptfs inode after a call to vfs_getattr() on the lower inode. I originally wrote this fix while working on eCryptfs on nfsv3 support, but discovered it also fixed an eCryptfs on ext4 nanosecond timestamp bug that was reported. https://bugs.launchpad.net/bugs/613873 Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02fs/partitions: Validate map_count in Mac partition tablesTimo Warns
commit fa7ea87a057958a8b7926c1a60a3ca6d696328ed upstream. Validate number of blocks in map and remove redundant variable. Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02s390: remove task_show_regsMartin Schwidefsky
commit 261cd298a8c363d7985e3482946edb4bfedacf98 upstream. task_show_regs used to be a debugging aid in the early bringup days of Linux on s390. /proc/<pid>/status is a world readable file, it is not a good idea to show the registers of a process. The only correct fix is to remove task_show_regs. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02xfs: fix untrusted inode number lookupDave Chinner
Upstream commit: 4536f2ad8b330453d7ebec0746c4374eadd649b1 Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode numbers during lookup") changes the inode lookup code to do btree lookups for untrusted inode numbers. This change made an invalid assumption about the alignment of inodes and hence incorrectly calculated the first inode in the cluster. As a result, some inode numbers were being incorrectly considered invalid when they were actually valid. The issue was not picked up by the xfstests suite because it always runs fsr and dump (the two utilities that utilise the bulkstat interface) on cache hot inodes and hence the lookup code in the cold cache path was not sufficiently exercised to uncover this intermittent problem. Fix the issue by relaxing the btree lookup criteria and then checking if the record returned contains the inode number we are lookup for. If it we get an incorrect record, then the inode number is invalid. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [dannf: Backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02xfs: remove block number from inode lookup codeDave Chinner
Upstream commit: 7b6259e7a83647948fa33a736cc832310c8d85aa The block number comes from bulkstat based inode lookups to shortcut the mapping calculations. We ar enot able to trust anything from bulkstat, so drop the block number as well so that the correct lookups and mappings are always done. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [dannf: Backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTEDDave Chinner
Upstream commit: 1920779e67cbf5ea8afef317777c5bf2b8096188 Inode numbers may come from somewhere external to the filesystem (e.g. file handles, bulkstat information) and so are inherently untrusted. Rename the flag we use for these lookups to make it obvious we are doing a lookup of an untrusted inode number and need to verify it completely before trying to read it from disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [dannf: backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02xfs: validate untrusted inode numbers during lookupDave Chinner
Upstream commit: 7124fe0a5b619d65b739477b3b55a20bf805b06d When we decode a handle or do a bulkstat lookup, we are using an inode number we cannot trust to be valid. If we are deleting inode chunks from disk (default noikeep mode), then we cannot trust the on disk inode buffer for any given inode number to correctly reflect whether the inode has been unlinked as the di_mode nor the generation number may have been updated on disk. This is due to the fact that when we delete an inode chunk, we do not write the clusters back to disk when they are removed - instead we mark them stale to avoid them being written back potentially over the top of something that has been subsequently allocated at that location. The result is that we can have locations of disk that look like they contain valid inodes but in reality do not. Hence we cannot simply convert the inode number to a block number and read the location from disk to determine if the inode is valid or not. As a result, and XFS_IGET_BULKSTAT lookup needs to actually look the inode up in the inode allocation btree to determine if the inode number is valid or not. It should be noted even on ikeep filesystems, there is the possibility that blocks on disk may look like valid inode clusters. e.g. if there are filesystem images hosted on the filesystem. Hence even for ikeep filesystems we really need to validate that the inode number is valid before issuing the inode buffer read. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [dannf: backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02xfs: always use iget in bulkstatChristoph Hellwig
Upstream commit: 7dce11dbac54fce777eea0f5fb25b2694ccd7900 The non-coherent bulkstat versionsthat look directly at the inode buffers causes various problems with performance optimizations that make increased use of just logging inodes. This patch makes bulkstat always use iget, which should be fast enough for normal use with the radix-tree based inode cache introduced a while ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> [dannf: backported to 2.6.32.y] Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02nfsd: correctly handle return value from nfsd_map_name_to_*NeilBrown
commit 47c85291d3dd1a51501555000b90f8e281a0458e upstream. These functions return an nfs status, not a host_err. So don't try to convert before returning. This is a regression introduced by 3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers, but missed these two. Reported-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02GFS2: Fix bmap allocation corner-case bugSteven Whitehouse
commit 07ccb7bf2c928fef4fea2cda69ba2e23479578db upstream. This patch solves a corner case during allocation which occurs if both metadata (indirect) and data blocks are required but there is an obstacle in the filesystem (e.g. a resource group header or another allocated block) such that when the allocation is requested only enough blocks for the metadata are returned. By changing the exit condition of this loop, we ensure that a minimum of one data block will always be returned. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02NFS: fix the return value of nfs_file_fsync()J. R. Okajima
commit 0702099bd86c33c2dcdbd3963433a61f3f503901 upstream. By the commit af7fa16 2010-08-03 NFS: Fix up the fsync code close(2) became returning the non-zero value even if it went well. nfs_file_fsync() should return 0 when "status" is positive. Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02sendfile(): check f_op.splice_write() rather than f_op.sendpage()Changli Gao
commit cc56f7de7f00d188c7c4da1e9861581853b9e92f upstream. sendfile(2) was reworked with the splice infrastructure, but it still checks f_op.sendpage() instead of f_op.splice_write() wrongly. Although if f_op.sendpage() exists, f_op.splice_write() always exists at the same time currently, the assumption will be broken in future silently. This patch also brings a side effect: sendfile(2) can work with any output file. Some security checks related to f_op are added too. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Cc: Przemyslaw Pawelczyk <przemyslaw@pawelczyk.it>
2011-03-02CRED: Fix kernel panic upon security_file_alloc() failure.Tetsuo Handa
commit 78d2978874e4e10e97dfd4fd79db45bdc0748550 upstream. In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL when security_file_alloc() returned an error. As a result, kernel will panic() due to put_cred(NULL) call within RCU callback. Fix this bug by assigning f->f_cred before calling security_file_alloc(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02CRED: Fix get_task_cred() and task_state() to not resurrect dead credentialsDavid Howells
commit de09a9771a5346029f4d11e4ac886be7f9bfdd75 upstream. It's possible for get_task_cred() as it currently stands to 'corrupt' a set of credentials by incrementing their usage count after their replacement by the task being accessed. What happens is that get_task_cred() can race with commit_creds(): TASK_1 TASK_2 RCU_CLEANER -->get_task_cred(TASK_2) rcu_read_lock() __cred = __task_cred(TASK_2) -->commit_creds() old_cred = TASK_2->real_cred TASK_2->real_cred = ... put_cred(old_cred) call_rcu(old_cred) [__cred->usage == 0] get_cred(__cred) [__cred->usage == 1] rcu_read_unlock() -->put_cred_rcu() [__cred->usage == 1] panic() However, since a tasks credentials are generally not changed very often, we can reasonably make use of a loop involving reading the creds pointer and using atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero. If successful, we can safely return the credentials in the knowledge that, even if the task we're accessing has released them, they haven't gone to the RCU cleanup code. We then change task_state() in procfs to use get_task_cred() rather than calling get_cred() on the result of __task_cred(), as that suffers from the same problem. Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be tripped when it is noticed that the usage count is not zero as it ought to be, for example: kernel BUG at kernel/cred.c:168! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 0 Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex 745 RIP: 0010:[<ffffffff81069881>] [<ffffffff81069881>] __put_cred+0xc/0x45 RSP: 0018:ffff88019e7e9eb8 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0 RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0 R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001 FS: 00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0) Stack: ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45 <0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000 <0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246 Call Trace: [<ffffffff810698cd>] put_cred+0x13/0x15 [<ffffffff81069b45>] commit_creds+0x16b/0x175 [<ffffffff8106aace>] set_current_groups+0x47/0x4e [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00 48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b 04 25 00 cc 00 00 48 3b b8 58 04 00 00 75 RIP [<ffffffff81069881>] __put_cred+0xc/0x45 RSP <ffff88019e7e9eb8> ---[ end trace df391256a100ebdd ]--- Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>