summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-12-21ext4: avoid hangs in ext4_da_should_update_i_disksize()Andrea Arcangeli
commit ea51d132dbf9b00063169c1159bee253d9649224 upstream. If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 #10 <signal handler called> #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () #12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21hfs: fix hfs_find_init() sb->ext_tree NULL ptr oopsPhillip Lougher
commit 434a964daa14b9db083ce20404a4a2add54d037a upstream. Clement Lecigne reports a filesystem which causes a kernel oops in hfs_find_init() trying to dereference sb->ext_tree which is NULL. This proves to be because the filesystem has a corrupted MDB extent record, where the extents file does not fit into the first three extents in the file record (the first blocks). In hfs_get_block() when looking up the blocks for the extent file (HFS_EXT_CNID), it fails the first blocks special case, and falls through to the extent code (which ultimately calls hfs_find_init()) which is in the process of being initialised. Hfs avoids this scenario by always having the extents b-tree fitting into the first blocks (the extents B-tree can't have overflow extents). The fix is to check at mount time that the B-tree fits into first blocks, i.e. fail if HFS_I(inode)->alloc_blocks >= HFS_I(inode)->first_blocks Note, the existing commit 47f365eb57573 ("hfs: fix oops on mount with corrupted btree extent records") becomes subsumed into this as a special case, but only for the extents B-tree (HFS_EXT_CNID), it is perfectly acceptable for the catalog B-Tree file to grow beyond three extents, with the remaining extent descriptors in the extents overfow. This fixes CVE-2011-2203 Reported-by: Clement LECIGNE <clement.lecigne@netasq.com> Signed-off-by: Phillip Lougher <plougher@redhat.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Moritz Mühlenhoff <jmm@inutil.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21jbd/jbd2: validate sb->s_first in journal_get_superblock()Eryu Guan
commit 8762202dd0d6e46854f786bdb6fb3780a1625efe upstream. I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3 image has s_first = 0 in journal superblock, and the 0 is passed to journal->j_head in journal_reset(), then to blocknr in cleanup_journal_tail(), in the end the J_ASSERT failed. So validate s_first after reading journal superblock from disk in journal_get_superblock() to ensure s_first is valid. The following script could reproduce it: fstype=ext3 blocksize=1024 img=$fstype.img offset=0 found=0 magic="c0 3b 39 98" dd if=/dev/zero of=$img bs=1M count=8 mkfs -t $fstype -b $blocksize -F $img filesize=`stat -c %s $img` while [ $offset -lt $filesize ] do if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then echo "Found journal: $offset" found=1 break fi offset=`echo "$offset+$blocksize" | bc` done if [ $found -ne 1 ];then echo "Magic \"$magic\" not found" exit 1 fi dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1 mkdir -p ./mnt mount -o loop $img ./mnt Cc: Jan Kara <jack@suse.cz> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Moritz Mühlenhoff <jmm@inutil.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09cifs: fix cifs stable patch cifs-fix-oplock-break-handling-try-2.patchSuresh Jayaraman
The stable release 2.6.32.32 added the upstream commit 12fed00de963433128b5366a21a55808fab2f756. However, one of the hunks of the original patch seems missing from the stable backport which can be found here: http://permalink.gmane.org/gmane.linux.kernel.stable/5676 This hunk corresponds to the change in is_valid_oplock_break() at fs/cifs/misc.c. This patch backports the missing hunk and is against linux-2.6.32.y stable kernel. Cc: Steve French <sfrench@us.ibm.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09eCryptfs: Extend array bounds for all filename charsTyler Hicks
commit 0f751e641a71157aa584c2a2e22fda52b52b8a56 upstream. From mhalcrow's original commit message: Characters with ASCII values greater than the size of filename_rev_map[] are valid filename characters. ecryptfs_decode_from_filename() will access kernel memory beyond that array, and ecryptfs_parse_tag_70_packet() will then decrypt those characters. The attacker, using the FNEK of the crafted file, can then re-encrypt the characters to reveal the kernel memory past the end of the filename_rev_map[] array. I expect low security impact since this array is statically allocated in the text area, and the amount of memory past the array that is accessible is limited by the largest possible ASCII filename character. This patch solves the issue reported by mhalcrow but with an implementation suggested by Linus to simply extend the length of filename_rev_map[] to 256. Characters greater than 0x7A are mapped to 0x00, which is how invalid characters less than 0x7A were previously being handled. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26hfs: add sanity check for file name lengthDan Carpenter
commit bc5b8a9003132ae44559edd63a1623b7b99dfb68 upstream. On a corrupted file system the ->len field could be wrong leading to a buffer overflow. Reported-and-acked-by: Clement LECIGNE <clement.lecigne@netasq.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodesTheodore Ts'o
commit 1cd9f0976aa4606db8d6e3dc3edd0aca8019372a upstream. This doesn't make much sense, and it exposes a bug in the kernel where attempts to create a new file in an append-only directory using O_CREAT will fail (but still leave a zero-length file). This was discovered when xfstests #79 was generalized so it could run on all file systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07ext4: fix BUG_ON() in ext4_ext_insert_extent()Zheng Liu
Does not corrispond with a direct commit in Linus's tree as it was fixed differently in the 3.0 release. We will meet with a BUG_ON() if following script is run. mkfs.ext4 -b 4096 /dev/sdb1 1000000 mount -t ext4 /dev/sdb1 /mnt/sdb1 fallocate -l 100M /mnt/sdb1/test sync for((i=0;i<170;i++)) do dd if=/dev/zero of=/mnt/sdb1/test conv=notrunc bs=256k count=1 seek=`expr $i \* 2` done umount /mnt/sdb1 mount -t ext4 /dev/sdb1 /mnt/sdb1 dd if=/dev/zero of=/mnt/sdb1/test conv=notrunc bs=256k count=1 seek=341 umount /mnt/sdb1 mount /dev/sdb1 /mnt/sdb1 dd if=/dev/zero of=/mnt/sdb1/test conv=notrunc bs=256k count=1 seek=340 sync The reason is that it forgot to mark dirty when splitting two extents in ext4_ext_convert_to_initialized(). Althrough ex has been updated in memory, it is not dirtied both in ext4_ext_convert_to_initialized() and ext4_ext_insert_extent(). The disk layout is corrupted. Then it will meet with a BUG_ON() when writting at the start of that extent again. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Xiaoyun Mao <xiaoyun.maoxy@aliyun-inc.com> Cc: Yingbin Wang <yingbin.wangyb@aliyun-inc.com> Cc: Jia Wan <jia.wanj@aliyun-inc.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07NLM: Don't hang forever on NLM unlock requestsTrond Myklebust
commit 0b760113a3a155269a3fba93a409c640031dd68f upstream. If the NLM daemon is killed on the NFS server, we can currently end up hanging forever on an 'unlock' request, instead of aborting. Basically, if the rpcbind request fails, or the server keeps returning garbage, we really want to quit instead of retrying. Tested-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07deal with races in /proc/*/{syscall,stack,personality}Al Viro
commit a9712bc12c40c172e393f85a9b2ba8db4bf59509 upstream. All of those are rw-r--r-- and all are broken for suid - if you open a file before the target does suid-root exec, you'll be still able to access it. For personality it's not a big deal, but for syscall and stack it's a real problem. Fix: check that task is tracable for you at the time of read(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07kcore: fix test for end of listDan Carpenter
commit 4fd2c20d964a8fb9861045f1022475c9d200d684 upstream. "m" is never NULL here. We need a different test for the end of list condition. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Leonardo Chiquitto <leonardo.lists@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07nfsd4: ignore WANT bits in open downgradeJ. Bruce Fields
commit c30e92df30d7d5fe65262fbce5d1b7de675fe34e upstream. We don't use WANT bits yet--and sending them can probably trigger a BUG() further down. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07nfsd4: Remove check for a 32-bit cookie in nfsd4_readdir()Bernd Schubert
commit 832023bffb4b493f230be901f681020caf3ed1f8 upstream. Fan Yong <yong.fan@whamcloud.com> noticed setting FMODE_32bithash wouldn't work with nfsd v4, as nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530 cookies have a 64 bit type and cookies are also defined as u64 in 'struct nfsd4_readdir'. So remove the test for >32-bit values. Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07epoll: fix spurious lockdep warningsNelson Elhage
commit d8805e633e054c816c47cb6e727c81f156d9253d upstream. epoll can acquire recursively acquire ep->mtx on multiple "struct eventpoll"s at once in the case where one epoll fd is monitoring another epoll fd. This is perfectly OK, since we're careful about the lock ordering, but it causes spurious lockdep warnings. Annotate the recursion using mutex_lock_nested, and add a comment explaining the nesting rules for good measure. Recent versions of systemd are triggering this, and it can also be demonstrated with the following trivial test program: --------------------8<-------------------- int main(void) { int e1, e2; struct epoll_event evt = { .events = EPOLLIN }; e1 = epoll_create1(0); e2 = epoll_create1(0); epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt); return 0; } --------------------8<-------------------- Reported-by: Paul Bolle <pebolle@tiscali.nl> Tested-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Nelson Elhage <nelhage@nelhage.com> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07splice: direct_splice_actor() should not use pos in sdChangli Gao
commit 2cb4b05e7647891b46b91c07c9a60304803d1688 upstream. direct_splice_actor() shouldn't use sd->pos, as sd->pos is for file reading, file->f_pos should be used instead. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07cifs: fix possible memory corruption in CIFSFindNextJeff Layton
commit 9438fabb73eb48055b58b89fc51e0bc4db22fabd upstream. The name_len variable in CIFSFindNext is a signed int that gets set to the resume_name_len in the cifs_search_info. The resume_name_len however is unsigned and for some infolevels is populated directly from a 32 bit value sent by the server. If the server sends a very large value for this, then that value could look negative when converted to a signed int. That would make that value pass the PATH_MAX check later in CIFSFindNext. The name_len would then be used as a length value for a memcpy. It would then be treated as unsigned again, and the memcpy scribbles over a ton of memory. Fix this by making the name_len an unsigned value in CIFSFindNext. Reported-by: Darren Lavender <dcl@hppine99.gbr.hp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29fuse: check size of FUSE_NOTIFY_INVAL_ENTRY messageMiklos Szeredi
commit c2183d1e9b3f313dd8ba2b1b0197c8d9fb86a7ae upstream. FUSE_NOTIFY_INVAL_ENTRY didn't check the length of the write so the message processing could overrun and result in a "kernel BUG at fs/fuse/dev.c:629!" Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29befs: Validate length of long symbolic links.Timo Warns
commit 338d0f0a6fbc82407864606f5b64b75aeb3c70f2 upstream. Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29fs/partitions/efi.c: corrupted GUID partition tables can cause kernel oopsTimo Warns
commit 3eb8e74ec72736b9b9d728bad30484ec89c91dde upstream. The kernel automatically evaluates partition tables of storage devices. The code for evaluating GUID partitions (in fs/partitions/efi.c) contains a bug that causes a kernel oops on certain corrupted GUID partition tables. This bug has security impacts, because it allows, for example, to prepare a storage device that crashes a kernel subsystem upon connecting the device (e.g., a "USB Stick of (Partial) Death"). crc = efi_crc32((const unsigned char *) (*gpt), le32_to_cpu((*gpt)->header_size)); computes a CRC32 checksum over gpt covering (*gpt)->header_size bytes. There is no validation of (*gpt)->header_size before the efi_crc32 call. A corrupted partition table may have large values for (*gpt)->header_size. In this case, the CRC32 computation access memory beyond the memory allocated for gpt, which may cause a kernel heap overflow. Validate value of GUID partition table header size. [akpm@linux-foundation.org: fix layout and indenting] Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Matt Domsch <Matt_Domsch@dell.com> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [dannf: backported to Debian's 2.6.32] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08proc: restrict access to /proc/PID/ioVasiliy Kulikov
commit 1d1221f375c94ef961ba8574ac4f85c8870ddd51 upstream. /proc/PID/io may be used for gathering private information. E.g. for openssh and vsftpd daemons wchars/rchars may be used to learn the precise password length. Restrict it to processes being able to ptrace the target process. ptrace_may_access() is needed to prevent keeping open file descriptor of "io" file, executing setuid binary and gathering io information of the setuid'ed process. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08cifs: check for NULL session passwordJeff Layton
commit 24e6cf92fde1f140d8eb0bf7cd24c2c78149b6b2 upstream. It's possible for a cifsSesInfo struct to have a NULL password, so we need to check for that prior to running strncmp on it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08cifs: fix NULL pointer dereference in cifs_find_smb_sesJeff Layton
commit fc87a40677bbe0937e2ff0642c7e83c9a4813f3d upstream. cifs_find_smb_ses assumes that the vol->password field is a valid pointer, but that's only the case if a password was passed in via the options string. It's possible that one won't be if there is no mount helper on the box. Reported-by: diabel <gacek-2004@wp.pl> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08cifs: clean up cifs_find_smb_ses (try #2)Jeff Layton
commit 4ff67b720c02c36e54d55b88c2931879b7db1cd2 upstream. This patch replaces the earlier patch by the same name. The only difference is that MAX_PASSWORD_SIZE has been increased to attempt to match the limits that windows enforces. Do a better job of matching sessions by authtype. Matching by username for a Kerberos session is incorrect, and anonymous sessions need special handling. Also, in the case where we do match by username, we also need to match by password. That ensures that someone else doesn't "borrow" an existing session without needing to know the password. Finally, passwords can be longer than 16 bytes. Bump MAX_PASSWORD_SIZE to 512 to match the size that the userspace mount helper allows. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> [dannf: backported to Debian's 2.6.32] Cc: Moritz Muehlenhoff <jmm@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08Revert "block: rescan partitions on invalidated devices on -ENOMEDIA too"Greg Kroah-Hartman
This reverts commit 5b2745db12a3f97a9ec9efd4ffa077da707d3e4c (commit 02e352287a40bd456eb78df705bf888bc3161d3f upstream) This should have only been commited on .38 and newer, not older kernels like this one, sorry. Cc: Tejun Heo <tj@kernel.org> Cc: David Zeuthen <zeuthen@gmail.com> Cc: Martin Pitt <martin.pitt@ubuntu.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08ext3: Fix oops in ext3_try_to_allocate_with_rsv()Jan Kara
commit ad95c5e9bc8b5885f94dce720137cac8fa8da4c9 upstream. Block allocation is called from two places: ext3_get_blocks_handle() and ext3_xattr_block_set(). These two callers are not necessarily synchronized because xattr code holds only xattr_sem and i_mutex, and ext3_get_blocks_handle() may hold only truncate_mutex when called from writepage() path. Block reservation code does not expect two concurrent allocations to happen to the same inode and thus assertions can be triggered or reservation structure corruption can occur. Fix the problem by taking truncate_mutex in xattr code to serialize allocations. CC: Sage Weil <sage@newdream.net> Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08NFSv4.1: update nfs4_fattr_bitmap_maxszAndy Adamson
commit e5012d1f3861d18c7f3814e757c1c3ab3741dbcd upstream. Attribute IDs assigned in RFC 5661 now require three bitmaps. Fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-13mm: prevent concurrent unmap_mapping_range() on the same inodeMiklos Szeredi
commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream. Michael Leun reported that running parallel opens on a fuse filesystem can trigger a "kernel BUG at mm/truncate.c:475" Gurudas Pai reported the same bug on NFS. The reason is, unmap_mapping_range() is not prepared for more than one concurrent invocation per inode. For example: thread1: going through a big range, stops in the middle of a vma and stores the restart address in vm_truncate_count. thread2: comes in with a small (e.g. single page) unmap request on the same vma, somewhere before restart_address, finds that the vma was already unmapped up to the restart address and happily returns without doing anything. Another scenario would be two big unmap requests, both having to restart the unmapping and each one setting vm_truncate_count to its own value. This could go on forever without any of them being able to finish. Truncate and hole punching already serialize with i_mutex. Other callers of unmap_mapping_range() do not, and it's difficult to get i_mutex protection for all callers. In particular ->d_revalidate(), which calls invalidate_inode_pages2_range() in fuse, may be called with or without i_mutex. This patch adds a new mutex to 'struct address_space' to prevent running multiple concurrent unmap_mapping_range() on the same mapping. [ We'll hopefully get rid of all this with the upcoming mm preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex lockbreak" patch in particular. But that is for 2.6.39 ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Michael Leun <lkml20101129@newton.leun.net> Reported-by: Gurudas Pai <gurudas.pai@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23exec: delay address limit change until point of no returnMathias Krause
commit dac853ae89043f1b7752875300faf614de43c74b upstream. Unconditionally changing the address limit to USER_DS and not restoring it to its old value in the error path is wrong because it prevents us using kernel memory on repeated calls to this function. This, in fact, breaks the fallback of hard coded paths to the init program from being ever successful if the first candidate fails to load. With this patch applied switching to USER_DS is delayed until the point of no return is reached which makes it possible to have a multi-arch rootfs with one arch specific init binary for each of the (hard coded) probed paths. Since the address limit is already set to USER_DS when start_thread() will be invoked, this redundancy can be safely removed. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23xfs: properly account for reclaimed inodesJohannes Weiner
commit 081003fff467ea0e727f66d5d435b4f473a789b3 upstream. When marking an inode reclaimable, a per-AG counter is increased, the inode is tagged reclaimable in its per-AG tree, and, when this is the first reclaimable inode in the AG, the AG entry in the per-mount tree is also tagged. When an inode is finally reclaimed, however, it is only deleted from the per-AG tree. Neither the counter is decreased, nor is the parent tree's AG entry untagged properly. Since the tags in the per-mount tree are not cleared, the inode shrinker iterates over all AGs that have had reclaimable inodes at one point in time. The counters on the other hand signal an increasing amount of slab objects to reclaim. Since "70e60ce xfs: convert inode shrinker to per-filesystem context" this is not a real issue anymore because the shrinker bails out after one iteration. But the problem was observable on a machine running v2.6.34, where the reclaimable work increased and each process going into direct reclaim eventually got stuck on the xfs inode shrinking path, trying to scan several million objects. Fix this by properly unwinding the reclaimable-state tracking of an inode when it is reclaimed. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> Backported-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23oprofile, dcookies: Fix possible circular locking dependencyRobert Richter
commit fe47ae7f53e179d2ef6771024feb000cbb86640f upstream. The lockdep warning below detects a possible A->B/B->A locking dependency of mm->mmap_sem and dcookie_mutex. The order in sync_buffer() is mm->mmap_sem/dcookie_mutex, while in sys_lookup_dcookie() it is vice versa. Fixing it in sys_lookup_dcookie() by unlocking dcookie_mutex before copy_to_user(). oprofiled/4432 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff810b444b>] might_fault+0x53/0xa3 but task is already holding lock: (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (dcookie_mutex){+.+.+.}: [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309 [<ffffffff81124e5c>] get_dcookie+0x30/0x144 [<ffffffffa0000fba>] sync_buffer+0x196/0x3ec [oprofile] [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile] [<ffffffff81467b96>] notifier_call_chain+0x37/0x63 [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67 [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16 [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c [<ffffffff81039e8f>] do_exit+0x2a/0x6fc [<ffffffff8103a5e4>] do_group_exit+0x83/0xae [<ffffffff8103a626>] sys_exit_group+0x17/0x1b [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff810b4478>] might_fault+0x80/0xa3 [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by oprofiled/4432: #0: (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149 stack backtrace: Pid: 4432, comm: oprofiled Not tainted 2.6.39-00008-ge5a450d #9 Call Trace: [<ffffffff81063193>] print_circular_bug+0xae/0xbc [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711 [<ffffffff8102ef13>] ? get_parent_ip+0x11/0x42 [<ffffffff810b444b>] ? might_fault+0x53/0xa3 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff810b444b>] ? might_fault+0x53/0xa3 [<ffffffff810d7d54>] ? path_put+0x22/0x27 [<ffffffff810b4478>] might_fault+0x80/0xa3 [<ffffffff810b444b>] ? might_fault+0x53/0xa3 [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b References: https://bugzilla.kernel.org/show_bug.cgi?id=13809 Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23fat: Fix corrupt inode flags when remove ATTR_SYS flagOGAWA Hirofumi
commit 1adffbae22332bb558c2a29de19d9aca391869f6 upstream. We are clearly missing '~' in fat_ioctl_set_attributes(). Reported-by: Dmitry Dmitriev <dimondmm@yandex.ru> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23UBIFS: fix memory leak on error pathArtem Bityutskiy
commit 812eb258311f89bcd664a34a620f249d54a2cd83 upstream. UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write failure because it forgets to free the 'struct ubifs_dent_node *dent' object. Although the object is small, the alignment can make it large - e.g., 2KiB if the min. I/O unit is 2KiB. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23UBIFS: fix shrinker object count reportsArtem Bityutskiy
commit cf610bf4199770420629d3bc273494bd27ad6c1d upstream. Sometimes VM asks the shrinker to return amount of objects it can shrink, and we return the ubifs_clean_zn_cnt in that case. However, it is possible that this counter is negative for a short period of time, due to the way UBIFS TNC code updates it. And I can observe the following warnings sometimes: shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788 This patch makes sure UBIFS never returns negative count of objects. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23UBIFS: fix a rare memory leak in ro to rw remounting pathArtem Bityutskiy
commit eaeee242c531cd4b0a4a46e8b5dd7ef504380c42 upstream. When re-mounting from R/O mode to R/W mode and the LEB count in the superblock is not up-to date, because for the underlying UBI volume became larger, we re-write the superblock. We allocate RAM for these purposes, but never free it. So this is a memory leak, although very rare one. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23eCryptfs: Allow 2 scatterlist entries for encrypted filenamesTyler Hicks
commit 8d08dab786ad5cc2aca2bf870de370144b78c85a upstream. The buffers allocated while encrypting and decrypting long filenames can sometimes straddle two pages. In this situation, virt_to_scatterlist() will return -ENOMEM, causing the operation to fail and the user will get scary error messages in their logs: kernel: ecryptfs_write_tag_70_packet: Internal error whilst attempting to convert filename memory to scatterlist; expected rc = 1; got rc = [-12]. block_aligned_filename_size = [272] kernel: ecryptfs_encrypt_filename: Error attempting to generate tag 70 packet; rc = [-12] kernel: ecryptfs_encrypt_and_encode_filename: Error attempting to encrypt filename; rc = [-12] kernel: ecryptfs_lookup: Error attempting to encrypt and encode filename; rc = [-12] The solution is to allow up to 2 scatterlist entries to be used. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23Fix for buffer overflow in ldm_frag_add not sufficientTimo Warns
commit cae13fe4cc3f24820ffb990c09110626837e85d4 upstream. As Ben Hutchings discovered [1], the patch for CVE-2011-1017 (buffer overflow in ldm_frag_add) is not sufficient. The original patch in commit c340b1d64000 ("fs/partitions/ldm.c: fix oops caused by corrupted partition table") does not consider that, for subsequent fragments, previously allocated memory is used. [1] http://lkml.org/lkml/2011/5/6/407 Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23ext4: release page cache in ext4_mb_load_buddy error pathYang Ruirui
commit 26626f1172fb4f3f323239a6a5cf4e082643fa46 upstream. Add missing page_cache_release in the error path of ext4_mb_load_buddy Signed-off-by: Yang Ruirui <ruirui.r.yang@tieto.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23jbd: fix fsync() tid wraparound bugTed Ts'o
commit d9b01934d56a96d9f4ae2d6204d4ea78a36f5f36 upstream. If an application program does not make any changes to the indirect blocks or extent tree, i_datasync_tid will not get updated. If there are enough commits (i.e., 2**31) such that tid_geq()'s calculations wrap, and there isn't a currently active transaction at the time of the fdatasync() call, this can end up triggering a BUG_ON in fs/jbd/commit.c: J_ASSERT(journal->j_running_transaction != NULL); It's pretty rare that this can happen, since it requires the use of fdatasync() plus *very* frequent and excessive use of fsync(). But with the right workload, it can. We fix this by replacing the use of tid_geq() with an equality test, since there's only one valid transaction id that is valid for us to start: namely, the currently running transaction (if it exists). Reported-by: Martin_Zielinski@McAfee.com Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23jbd: Fix forever sleeping process in do_get_write_access()Jan Kara
commit 2842bb20eed2e25cde5114298edc62c8883a1d9a upstream. In do_get_write_access() we wait on BH_Unshadow bit for buffer to get from shadow state. The waking code in journal_commit_transaction() has a bug because it does not issue a memory barrier after the buffer is moved from the shadow state and before wake_up_bit() is called. Thus a waitqueue check can happen before the buffer is actually moved from the shadow state and waiting process may never be woken. Fix the problem by issuing proper barrier. Reported-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23ext3: Fix fs corruption when make_indexed_dir() failsJan Kara
commit 86c4f6d85595cd7da635dc6985d27bfa43b1ae10 upstream. When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated block for index tree root, we did not properly mark all changed buffers dirty. This lead to only some of these buffers being written out and thus effectively corrupting the directory. Fix the issue by marking all changed data dirty even in the error failure case. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23block: rescan partitions on invalidated devices on -ENOMEDIA tooTejun Heo
commit 02e352287a40bd456eb78df705bf888bc3161d3f upstream. __blkdev_get() doesn't rescan partitions if disk->fops->open() fails, which leads to ghost partition devices lingering after medimum removal is known to both the kernel and userland. The behavior also creates a subtle inconsistency where O_NONBLOCK open, which doesn't fail even if there's no medium, clears the ghots partitions, which is exploited to work around the problem from userland. Fix it by updating __blkdev_get() to issue partition rescan after -ENOMEDIA too. This was reported in the following bz. https://bugzilla.kernel.org/show_bug.cgi?id=13029 Stable: 2.6.38 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Zeuthen <zeuthen@gmail.com> Reported-by: Martin Pitt <martin.pitt@ubuntu.com> Reported-by: Kay Sievers <kay.sievers@vrfy.org> Tested-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23cifs: add fallback in is_path_accessible for old serversJeff Layton
commit 221d1d797202984cb874e3ed9f1388593d34ee22 upstream. The is_path_accessible check uses a QPathInfo call, which isn't supported by ancient win9x era servers. Fall back to an older SMBQueryInfo call if it fails with the magic error codes. Reported-and-Tested-by: Sandro Bonazzola <sandro.bonazzola@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23CIFS: Fix memory over bound bug in cifs_parse_mount_optionsPavel Shilovsky
commit 4906e50b37e6f6c264e7ee4237343eb2b7f8d16d upstream. While password processing we can get out of options array bound if the next character after array is delimiter. The patch adds a check if we reach the end. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23Validate size of EFI GUID partition entries.Timo Warns
commit fa039d5f6b126fbd65eefa05db2f67e44df8f121 upstream. Otherwise corrupted EFI partition tables can cause total confusion. Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23cifs: check for bytes_remaining going to zero in CIFS_SessSetupJeff Layton
commit fcda7f4578bbf9717444ca6da8a421d21489d078 upstream. It's possible that when we go to decode the string area in the SESSION_SETUP response, that bytes_remaining will be 0. Decrementing it at that point will mean that it can go "negative" and wrap. Check for a bytes_remaining value of 0, and don't try to decode the string area if that's the case. Reported-and-Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09btrfs: Require CAP_SYS_ADMIN for filesystem rebalanceBen Hutchings
commit 6f88a4403def422bd8e276ddf6863d6ac71435d2 upstream. Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire filesystem and may run uninterruptibly for a long time. This does not seem to be something that an unprivileged user should be able to do. Reported-by: Aron Xu <happyaron.xu@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09cifs: fix another memleak, in cifs_root_igetOskar Schirmer
commit a7851ce73b9fdef53f251420e6883cf4f3766534 upstream. cifs_root_iget allocates full_path through cifs_build_path_to_root, but fails to kfree it upon cifs_get_inode_info* failure. Make all failure exit paths traverse clean up handling at the end of the function. Signed-off-by: Oskar Schirmer <oskar@scara.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09Increase OSF partition limit from 8 to 18Linus Torvalds
commit 34d211a2d5df4984a35b18d8ccacbe1d10abb067 upstream. It turns out that while a maximum of 8 partitions may be what people "should" have had, you can actually fit up to 18 entries(*) in a sector. And some people clearly were taking advantage of that, like Michael Cree, who had ten partitions on one of his OSF disks. (*) The OSF partition data starts at byte offset 64 in the first sector, and the array of 16-byte partition entries start at offset 148 in the on-disk partition structure. Reported-by: Michael Cree <mcree@orcon.net.nz> Cc: stable@kernel.org (v2.6.38) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09Fix corrupted OSF partition table parsingTimo Warns
commit 1eafbfeb7bdf59cfe173304c76188f3fd5f1fd05 upstream. The kernel automatically evaluates partition tables of storage devices. The code for evaluating OSF partitions contains a bug that leaks data from kernel heap memory to userspace for certain corrupted OSF partitions. In more detail: for (i = 0 ; i < le16_to_cpu(label->d_npartitions); i++, partition++) { iterates from 0 to d_npartitions - 1, where d_npartitions is read from the partition table without validation and partition is a pointer to an array of at most 8 d_partitions. Add the proper and obvious validation. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: stable@kernel.org [ Changed the patch trivially to not repeat the whole le16_to_cpu() thing, and to use an explicit constant for the magic value '8' ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09nfs: fix compilation warningJovi Zhang
commit 43b7c3f051dea504afccc39bcb56d8e26c2e0b77 upstream. this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>