summaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2012-11-05ext4: fix unjournaled inode bitmap modificationEric Sandeen
commit ffb5387e85d528fb6d0d924abfa3fbf0fc484071 upstream. commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed ext4_new_inode() such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal during log replay. Nix ran into this when using the journal_async_commit mount option, which enables journal checksumming. The ensuing journal replay failures due to the bad checksums led to filesystem corruption reported as the now infamous "Apparent serious progressive ext4 data corruption bug" [ Changed by tytso to only call ext4_journal_get_write_access() only when we're fairly certain that we're going to allocate the inode. ] I've tested this by mounting with journal_checksum and running fsstress then dropping power; I've also tested by hacking DM to create snapshots w/o first quiescing, which allows me to test journal replay repeatedly w/o actually power-cycling the box. Without the patch I hit a journal checksum error every time. With this fix it survives many iterations. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28ext4: Avoid underflow in ext4_trim_fs()Lukas Czerner
commit 5de35e8d5c02d271c20e18337e01bc20e6ef472e upstream. Currently if len argument in ext4_trim_fs() is smaller than one block, the 'end' variable underflow. Avoid that by returning EINVAL if len is smaller than file system block. Also remove useless unlikely(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28ext4: race-condition protection for ext4_convert_unwritten_extents_endioDmitry Monakhov
commit dee1f973ca341c266229faa5a1a5bb268bed3531 upstream. We assumed that at the time we call ext4_convert_unwritten_extents_endio() extent in question is fully inside [map.m_lblk, map->m_len] because it was already split during submission. But this may not be true due to a race between writeback vs fallocate. If extent in question is larger than requested we will split it again. Special precautions should being done if zeroout required because [map.m_lblk, map->m_len] already contains valid data. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: fix fdatasync() for files with only i_size changesJan Kara
commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: always set i_op in ext4_mknod()Bernd Schubert
commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream. ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR to mask those methods. And ext4_iget also always sets it, so there is an inconsistency. Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: online defrag is not supported for journaled filesDmitry Monakhov
commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream. Proper block swap for inodes with full journaling enabled is truly non obvious task. In order to be on a safe side let's explicitly disable it for now. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: move_extent code cleanupDmitry Monakhov
commit 03bd8b9b896c8e940f282f540e6b4de90d666b7c upstream. - Remove usless checks, because it is too late to check that inode != NULL at the moment it was referenced several times. - Double lock routines looks very ugly and locking ordering relays on order of i_ino, but other kernel code rely on order of pointers. Let's make them simple and clean. - check that inodes belongs to the same SB as soon as possible. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: fix crash when accessing /proc/mounts concurrentlyHerton Ronaldo Krzesinski
commit 50df9fd55e4271e89a7adf3b1172083dd0ca199d upstream. The crash was caused by a variable being erronously declared static in token2str(). In addition to /proc/mounts, the problem can also be easily replicated by accessing /proc/fs/ext4/<partition>/options in parallel: $ cat /proc/fs/ext4/<partition>/options > options.txt ... and then running the following command in two different terminals: $ while diff /proc/fs/ext4/<partition>/options options.txt; do true; done This is also the cause of the following a crash while running xfstests #234, as reported in the following bug reports: https://bugs.launchpad.net/bugs/1053019 https://bugzilla.kernel.org/show_bug.cgi?id=47731 Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Brad Figg <brad.figg@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: fix potential deadlock in ext4_nonda_switch()Theodore Ts'o
commit 00d4e7362ed01987183e9528295de3213031309c upstream. In ext4_nonda_switch(), if the file system is getting full we used to call writeback_inodes_sb_if_idle(). The problem is that we can be holding i_mutex already, and this causes a potential deadlock when writeback_inodes_sb_if_idle() when it tries to take s_umount. (See lockdep output below). As it turns out we don't need need to hold s_umount; the fact that we are in the middle of the write(2) system call will keep the superblock pinned. Unfortunately writeback_inodes_sb() checks to make sure s_umount is taken, and the VFS uses a different mechanism for making sure the file system doesn't get unmounted out from under us. The simplest way of dealing with this is to just simply grab s_umount using a trylock, and skip kicking the writeback flusher thread in the very unlikely case that we can't take a read lock on s_umount without blocking. Also, we now check the cirteria for kicking the writeback thread before we decide to whether to fall back to non-delayed writeback, so if there are any outstanding delayed allocation writes, we try to get them resolved as soon as possible. [ INFO: possible circular locking dependency detected ] 3.6.0-rc1-00042-gce894ca #367 Not tainted ------------------------------------------------------- dd/8298 is trying to acquire lock: (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 but task is already holding lock: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 which lock already depends on the new lock. 2 locks held by dd/8298: #0: (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3 #1: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 stack backtrace: Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367 Call Trace: [<c015b79c>] ? console_unlock+0x345/0x372 [<c06d62a1>] print_circular_bug+0x190/0x19d [<c019906c>] __lock_acquire+0x86d/0xb6c [<c01999db>] ? mark_held_locks+0x5c/0x7b [<c0199724>] lock_acquire+0x66/0xb9 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c06db935>] down_read+0x28/0x58 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4 [<c0271ece>] ext4_da_write_begin+0x27/0x193 [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205 [<c01ddce7>] generic_file_aio_write+0x78/0xd3 [<c026d336>] ext4_file_write+0x480/0x4a6 [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c [<c0180944>] ? sched_clock_cpu+0x11a/0x13e [<c01967e9>] ? trace_hardirqs_off+0xb/0xd [<c018099f>] ? local_clock+0x37/0x4e [<c0209f2c>] do_sync_write+0x67/0x9d [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44 [<c020a7b9>] vfs_write+0x7b/0xe6 [<c020a9a6>] sys_write+0x3b/0x64 [<c06dd4bd>] syscall_call+0x7/0xb Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: avoid duplicate writes of the backup bg descriptor blocksYongqiang Yang
commit 2ebd1704ded88a8ae29b5f3998b13959c715c4be upstream. The resize code was needlessly writing the backup block group descriptor blocks multiple times (once per block group) during an online resize. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: don't copy non-existent gdt blocks when resizingYongqiang Yang
commit 6df935ad2fced9033ab52078825fcaf6365f34b7 upstream. The resize code was copying blocks at the beginning of each block group in order to copy the superblock and block group descriptor table (gdt) blocks. This was, unfortunately, being done even for block groups that did not have super blocks or gdt blocks. This is a complete waste of perfectly good I/O bandwidth, to skip writing those blocks for sparse bg's. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: ignore last group w/o enough space when resizing instead of BUG'ingYongqiang Yang
commit 03c1c29053f678234dbd51bf3d65f3b7529021de upstream. If the last group does not have enough space for group tables, ignore it instead of calling BUG_ON(). Reported-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: fix kernel BUG on large-scale rm -rf commandsTheodore Ts'o
commit 89a4e48f8479f8145eca9698f39fe188c982212f upstream. Commit 968dee7722: "ext4: fix hole punch failure when depth is greater than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel crashes when users ran run "rm -rf" on large directory hierarchy on ext4 filesystems on RAID devices: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710) Call Trace: [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110 [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0 [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0 [<ffffffff81207e05>] ext4_truncate+0xf5/0x100 [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490 [<ffffffff811a1312>] evict+0xa2/0x1a0 [<ffffffff811a1513>] iput+0x103/0x1f0 [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0 [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40 [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50 [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 RIP [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0 This could be reproduced as follows: The problem in commit 968dee7722 was that caused the variable 'i' to be left uninitialized if the truncate required more space than was available in the journal. This resulted in the function ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused ext4_ext_remove_space() to restart the truncate operation after starting a new jbd2 handle. Reported-by: Maciej Żenczykowski <maze@google.com> Reported-by: Marti Raudsepp <marti@juffo.org> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: fix long mount times on very big file systemsTheodore Ts'o
commit 0548bbb85337e532ca2ed697c3e9b227ff2ed4b4 upstream. Commit 8aeb00ff85a: "ext4: fix overhead calculation used by ext4_statfs()" introduced a O(n**2) calculation which makes very large file systems take forever to mount. Fix this with an optimization for non-bigalloc file systems. (For bigalloc file systems the overhead needs to be set in the the superblock.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: avoid kmemcheck complaint from reading uninitialized memoryTheodore Ts'o
commit 7e731bc9a12339f344cddf82166b82633d99dd86 upstream. Commit 03179fe923 introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: make sure the journal sb is written in ext4_clear_journal_err()Theodore Ts'o
commit d796c52ef0b71a988364f6109aeb63d79c5b116b upstream. After we transfer set the EXT4_ERROR_FS bit in the file system superblock, it's not enough to call jbd2_journal_clear_err() to clear the error indication from journal superblock --- we need to call jbd2_journal_update_sb_errno() as well. Otherwise, when the root file system is mounted read-only, the journal is replayed, and the error indicator is transferred to the superblock --- but the s_errno field in the jbd2 superblock is left set (since although we cleared it in memory, we never flushed it out to disk). This can end up confusing e2fsck. We should make e2fsck more robust in this case, but the kernel shouldn't be leaving things in this confused state, either. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09ext4: undo ext4_calc_metadata_amount if we fail to claim spaceTheodore Ts'o
commit 03179fe92318e7934c180d96f12eff2cb36ef7b6 upstream. The function ext4_calc_metadata_amount() has side effects, although it's not obvious from its function name. So if we fail to claim space, regardless of whether we retry to claim the space again, or return an error, we need to undo these side effects. Otherwise we can end up incorrectly calculating the number of metadata blocks needed for the operation, which was responsible for an xfstests failure for test #271 when using an ext2 file system with delalloc enabled. Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09ext4: don't let i_reserved_meta_blocks go negativeBrian Foster
commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream. If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09ext4: fix hole punch failure when depth is greater than 0Ashish Sangwan
commit 968dee77220768a5f52cf8b21d0bdb73486febef upstream. Whether to continue removing extents or not is decided by the return value of function ext4_ext_more_to_rm() which checks 2 conditions: a) if there are no more indexes to process. b) if the number of entries are decreased in the header of "depth -1". In case of hole punch, if the last block to be removed is not part of the last extent index than this index will not be deleted, hence the number of valid entries in the extent header of "depth - 1" will remain as it is and ext4_ext_more_to_rm will return 0 although the required blocks are not yet removed. This patch fixes the above mentioned problem as instead of removing the extents from the end of file, it starts removing the blocks from the particular extent from which removing blocks is actually required and continue backward until done. Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09ext4: fix overhead calculation used by ext4_statfs()Theodore Ts'o
commit 952fc18ef9ec707ebdc16c0786ec360295e5ff15 upstream. Commit f975d6bcc7a introduced bug which caused ext4_statfs() to miscalculate the number of file system overhead blocks. This causes the f_blocks field in the statfs structure to be larger than it should be. This would in turn cause the "df" output to show the number of data blocks in the file system and the number of data blocks used to be larger than they should be. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09ext4: pass a char * to ext4_count_free() instead of a buffer_head ptrTheodore Ts'o
commit f6fb99cadcd44660c68e13f6eab28333653621e6 upstream. Make it possible for ext4_count_free to operate on buffers and not just data in buffer_heads. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-29ext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXTAl Viro
commit 331ae4962b975246944ea039697a8f1cadce42bb upstream. Caused, AFAICS, by mismerge in commit ff9cb1c4eead ("Merge branch 'for_linus' into for_linus_merged") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-17ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bgTheodore Ts'o
commit b0dd6b70f0fda17ae9762fbb72d98e40a4f66556 upstream. Ext3 filesystems that are converted to use as many ext4 file system features as possible will enable uninit_bg to speed up e2fsck times. These file systems will have a native ext3 layout of inode tables and block allocation bitmaps (as opposed to ext4's flex_bg layout). Unfortunately, in these cases, when first allocating a block in an uninitialized block group, ext4 would incorrectly calculate the number of free blocks in that block group, and then errorneously report that the file system was corrupt: EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd This problem can be reproduced via: mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g mount -t ext4 /dev/vdd /mnt fallocate -l 4600m /mnt/test The problem was caused by a bone headed mistake in the check to see if a particular metadata block was part of the block group. Many thanks to Kees Cook for finding and bisecting the buggy commit which introduced this bug (commit fd034a84e1, present since v3.2). Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ext4: don't set i_flags in EXT4_IOC_SETFLAGSTao Ma
commit b22b1f178f6799278d3178d894f37facb2085765 upstream. Commit 7990696 uses the ext4_{set,clear}_inode_flags() functions to change the i_flags automatically but fails to remove the error setting of i_flags. So we still have the problem of trashing state flags. Fix this by removing the assignment. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ext4: remove mb_groups before tearing down the buddy_cacheSalman Qazi
commit 95599968d19db175829fb580baa6b68939b320fb upstream. We can't have references held on pages in the s_buddy_cache while we are trying to truncate its pages and put the inode. All the pages must be gone before we reach clear_inode. This can only be gauranteed if we can prevent new users from grabbing references to s_buddy_cache's pages. The original bug can be reproduced and the bug fix can be verified by: while true; do mount -t ext4 /dev/ram0 /export/hda3/ram0; \ umount /export/hda3/ram0; done & while true; do cat /proc/fs/ext4/ram0/mb_groups; done Signed-off-by: Salman Qazi <sqazi@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ext4: add ext4_mb_unload_buddy in the error pathSalman Qazi
commit 02b7831019ea4e7994968c84b5826fa8b248ffc8 upstream. ext4_free_blocks fails to pair an ext4_mb_load_buddy with a matching ext4_mb_unload_buddy when it fails a memory allocation. Signed-off-by: Salman Qazi <sqazi@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ext4: don't trash state flags in EXT4_IOC_SETFLAGSTheodore Ts'o
commit 79906964a187c405db72a3abc60eb9b50d804fbc upstream. In commit 353eb83c we removed i_state_flags with 64-bit longs, But when handling the EXT4_IOC_SETFLAGS ioctl, we replace i_flags directly, which trashes the state flags which are stored in the high 32-bits of i_flags on 64-bit platforms. So use the the ext4_{set,clear}_inode_flags() functions which use atomic bit manipulation functions instead. Reported-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ext4: add missing save_error_info() to ext4_error()Theodore Ts'o
commit f3fc0210c0fc91900766c995f089c39170e68305 upstream. The ext4_error() function is missing a call to save_error_info(). Since this is the function which marks the file system as containing an error, this oversight (which was introduced in 2.6.36) is quite significant, and should be backported to older stable kernels with high urgency. Reported-by: Ken Sumrall <ksumrall@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: ksumrall@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ext4: disallow hard-linked directory in ext4_lookupAndreas Dilger
commit 7e936b737211e6b54e34b71a827e56b872e958d8 upstream. A hard-linked directory to its parent can cause the VFS to deadlock, and is a sign of a corrupted file system. So detect this case in ext4_lookup(), before the rmdir() lockup scenario can take place. Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ext4: fix potential integer overflow in alloc_flex_gd()Haogang Chen
commit 967ac8af4475ce45474800709b12137aa7634c77 upstream. In alloc_flex_gd(), when flexbg_size is large, kmalloc size would overflow and flex_gd->groups would point to a buffer smaller than expected, causing OOB accesses when it is used. Note that in ext4_resize_fs(), flexbg_size is calculated using sbi->s_log_groups_per_flex, which is read from the disk and only bounded to [1, 31]. The patch returns NULL for too large flexbg_size. Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ext4: force ro mount if ext4_setup_super() failsEric Sandeen
commit 7e84b6216467b84cd332c8e567bf5aa113fd2f38 upstream. If ext4_setup_super() fails i.e. due to a too-high revision, the error is logged in dmesg but the fs is not mounted RO as indicated. Tested by: # mkfs.ext4 -r 4 /dev/sdb6 # mount /dev/sdb6 /mnt/test # dmesg | grep "too high" [164919.759248] EXT4-fs (sdb6): revision level too high, forcing read-only mode # grep sdb6 /proc/mounts /dev/sdb6 /mnt/test2 ext4 rw,seclabel,relatime,data=ordered 0 0 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ext4: fix potential NULL dereference in ext4_free_inodes_counts()Dan Carpenter
commit bb3d132a24cd8bf5e7773b2d9f9baa58b07a7dae upstream. The ext4_get_group_desc() function returns NULL on error, and ext4_free_inodes_count() function dereferences it without checking. There is a check on the next line, but it's too late. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-23Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "These are two low-risk bug fixes for ext4, fixing a compile warning and a potential deadlock." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: super.c: unused variable warning without CONFIG_QUOTA jbd2: use GFP_NOFS for blkdev_issue_flush
2012-04-23super.c: unused variable warning without CONFIG_QUOTAEldad Zack
sb info is only checked with quota support. fs/ext4/super.c: In function ‘parse_options’: fs/ext4/super.c:1600:23: warning: unused variable ‘sbi’ [-Wunused-variable] Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2012-04-17Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 regression fixes from Ted Ts'o: "This fixes a scalability problem reported by Andi Kleen and Tim Chen; they were quite secretive about the precise nature of their workload, but they later admitted that it only showed up when they were using a large sparse file, so the amount of data I/O that was needed was close to zero. I'm not sure how realistic this is and it's only a regression if you consider changes made since 2.6.39 to be a "regression" vis-a-vis the policy regarding post-merge window bug fixes, but Linus agreed it was worth fixing, so I'm including it in this pull request. This also fixes the journalled quota mount options, which I accidentally broke while I was cleaning up the mount option handling." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix handling of journalled quota options ext4: address scalability issue by removing extent cache statistics
2012-04-16ext4: fix handling of journalled quota optionsTheodore Ts'o
Commit 26092bf5 broke handling of journalled quota mount options by trying to parse argument of every mount option as a number. Fix this by dealing with the quota options before we call match_int(). Thanks to Jan Kara for discovering this regression. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2012-04-16ext4: address scalability issue by removing extent cache statisticsTheodore Ts'o
Andi Kleen and Tim Chen have reported that under certain circumstances the extent cache statistics are causing scalability problems due to cache line bounces. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-04-13ext4: fix endianness breakage in ext4_split_extent_at()Al Viro
->ee_len is __le16, so assigning cpu_to_le32() to it is going to do Bad Things(tm) on big-endian hosts... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-29Revert "ext4: don't release page refs in ext4_end_bio()"Linus Torvalds
This reverts commit b43d17f319f2c502b17139d1cf70731b2b62c644. Dave Jones reports that it causes lockups on his laptop, and his debug output showed a lot of processes hung waiting for page_writeback (or more commonly - processes hung waiting for a lock that was held during that writeback wait). The page_writeback hint made Ted suggest that Dave look at this commit, and Dave verified that reverting it makes his problems go away. Ted says: "That commit fixes a race which is seen when you write into fallocated (and hence uninitialized) disk blocks under *very* heavy memory pressure. Furthermore, although theoretically it could trigger under normal direct I/O writes, it only seems to trigger if you are issuing a huge number of AIO writes, such that a just-written page can get evicted from memory, and then read back into memory, before the workqueue has a chance to update the extent tree. This race has been around for a little over a year, and no one noticed until two months ago; it only happens under fairly exotic conditions, and in fact even after trying very hard to create a simple repro under lab conditions, we could only reproduce the problem and confirm the fix on production servers running MySQL on very fast PCIe-attached flash devices. Given that Dave was able to hit this problem pretty quickly, if we confirm that this commit is at fault, the only reasonable thing to do is to revert it IMO." Reported-and-tested-by: Dave Jones <davej@redhat.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-29Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd changes from Bruce Fields: Highlights: - Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features, moving us closer to a complete 4.1 implementation. - Bernd Schubert fixed a long-standing problem with readdir cookies on ext2/3/4. - Jeff Layton performed a long-overdue overhaul of the server reboot recovery code which will allow us to deprecate the current code (a rather unusual user of the vfs), and give us some needed flexibility for further improvements. - Like the client, we now support numeric uid's and gid's in the auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x. Plus miscellaneous bugfixes and cleanup. Thanks to everyone! There are also some delegation fixes waiting on vfs review that I suppose will have to wait for 3.5. With that done I think we'll finally turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly symbolic as it's been on by default in distro's for a while). And the list of 4.1 todo's should be achievable for 3.5 as well: http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues though we may still want a bit more experience with it before turning it on by default. * 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits) nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled nfsd4: use auth_unix unconditionally on backchannel nfsd: fix NULL pointer dereference in cld_pipe_downcall nfsd4: memory corruption in numeric_name_to_id() sunrpc: skip portmap calls on sessions backchannel nfsd4: allow numeric idmapping nfsd: don't allow legacy client tracker init for anything but init_net nfsd: add notifier to handle mount/unmount of rpc_pipefs sb nfsd: add the infrastructure to handle the cld upcall nfsd: add a header describing upcall to nfsdcld nfsd: add a per-net-namespace struct for nfsd sunrpc: create nfsd dir in rpc_pipefs nfsd: add nfsd4_client_tracking_ops struct and a way to set it nfsd: convert nfs4_client->cl_cb_flags to a generic flags field NFSD: Fix nfs4_verifier memory alignment NFSD: Fix warnings when NFSD_DEBUG is not defined nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes) nfsd: rename 'int access' to 'int may_flags' in nfsd_open() ext4: return 32/64-bit dir name hash according to usage type fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash ...
2012-03-28Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates for 3.4 from Ted Ts'o: "Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes The changes to export dirty_writeback_interval are from Artem's s_dirt cleanup patch series. The same is true of the change to remove the s_dirt helper functions which never got used by anyone in-tree. I've run these changes by Al Viro, and am carrying them so that Artem can more easily fix up the rest of the file systems during the next merge window. (Originally we had hopped to remove the use of s_dirt from ext4 during this merge window, but his patches had some bugs, so I ultimately ended dropping them from the ext4 tree.)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (66 commits) vfs: remove unused superblock helpers mm: export dirty_writeback_interval ext4: remove useless s_dirt assignment ext4: write superblock only once on unmount ext4: do not mark superblock as dirty unnecessarily ext4: correct ext4_punch_hole return codes ext4: remove restrictive checks for EOFBLOCKS_FL ext4: always set then trimmed blocks count into len ext4: fix trimmed block count accunting ext4: fix start and len arguments handling in ext4_trim_fs() ext4: update s_free_{inodes,blocks}_count during online resize ext4: change some printk() calls to use ext4_msg() instead ext4: avoid output message interleaving in ext4_error_<foo>() ext4: remove trailing newlines from ext4_msg() and ext4_error() messages ext4: add no_printk argument validation, fix fallout ext4: remove redundant "EXT4-fs: " from uses of ext4_msg ext4: give more helpful error message in ext4_ext_rm_leaf() ext4: remove unused code from ext4_ext_map_blocks() ext4: rewrite punch hole to use ext4_ext_remove_space() jbd2: cleanup journal tail after transaction commit ...
2012-03-21ext4: remove useless s_dirt assignmentArtem Bityutskiy
Clean-up ext4 a tiny bit by removing useless s_dirt assignment in 'ext4_fill_super()' because a bit later we anyway call 'ext4_setup_super()' which writes the superblock to the media unconditionally. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-21ext4: write superblock only once on unmountArtem Bityutskiy
In some rather rare cases it is possible that ext4 may the superblock to the media twice. This patch makes sure this does not happen. This should speed up unmounting in those cases. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-21ext4: do not mark superblock as dirty unnecessarilyArtem Bityutskiy
Commit a0375156ca1041574b5d47cc7e32f10b891151b0 cleaned up superblock dirtying handling, but missed one place. This patch does what was intended: if we have the journal, then we update the superblock through the journal rather than doing this directly. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-21ext4: correct ext4_punch_hole return codesAllison Henderson
ext4_punch_hole returns -ENOTSUPP but it should be using -EOPNOTSUPP Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-21ext4: remove restrictive checks for EOFBLOCKS_FLLukas Czerner
We are going to remove the EOFBLOCKS_FL flag in the future, so this is the first part of the removal. We can not remove it entirely just now, since the e2fsck is still checking for it and it might cause headache to some people. Instead, remove the restrictive checks now and the rest later, when the new e2fsck code is out and common enough. This is also needed because punch hole already breaks the EOFBLOCKS_FL semantics, so it might cause the some troubles. So simply remove it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-21ext4: always set then trimmed blocks count into lenLukas Czerner
Currently if the range to trim is too small, for example on 1K fs the request to trim the first block, then the 'range->len' is not set reporting wrong number of discarded block to the caller. Fix this by always setting the 'range->len' before we return. Note that when there is a failure (-EINVAL) caller can not depend on 'range->len' being set properly. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-21ext4: fix trimmed block count accuntingLukas Czerner
Currently when there is not enough free blocks in the block group to discard (grp->bb_free < minlen) the 'trimmed' is bumped up anyway with the number of discarded blocks from the previous iteration. Fix this by bumping up 'trimmed' only if the ext4_trim_all_free() was actually run. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-21ext4: fix start and len arguments handling in ext4_trim_fs()Lukas Czerner
The overflow can happen when we are calling get_group_no_and_offset() which stores the group number in the ext4_grpblk_t type which is actually int. However when the blocknr is big enough the group number might be bigger than ext4_grpblk_t resulting in overflow. This will most likely happen with FITRIM default argument len = ULLONG_MAX. Fix this by using "end" variable instead of "start+len" as it is easier to get right and specifically check that the end is not beyond the end of the file system, so we are sure that the result of get_group_no_and_offset() will not overflow. Otherwise truncate it to the size of the file system. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-20ext4: initialization of ext4_li_mtx needs to be done earlierAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>