summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-08-10CIFS: Fix compile error with __init in cifs_init_dns_resolver() definitionMichael Neuling
An allmodconfig compile on ppc64 with 2.6.32.17 currently gives this error fs/cifs/dns_resolve.h:27: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'cifs_init_dns_resolver' This adds the correct header file to fix this. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-10CIFS: Remove __exit mark from cifs_exit_dns_resolver()David Howells
commit 51c20fcced5badee0e2021c6c89f44aa3cbd72aa upstream. Remove the __exit mark from cifs_exit_dns_resolver() as it's called by the module init routine in case of error, and so may have been discarded during linkage. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-10GFS2: rename causes kernel OopsBob Peterson
commit 728a756b8fcd22d80e2dbba8117a8a3aafd3f203 upstream. This patch fixes a kernel Oops in the GFS2 rename code. The problem was in the way the gfs2 directory code was trying to re-use sentinel directory entries. In the failing case, gfs2's rename function was renaming a file to another name that had the same non-trivial length. The file being renamed happened to be the first directory entry on the leaf block. First, the rename code (gfs2_rename in ops_inode.c) found the original directory entry and decided it could do its job by simply replacing the directory entry with another. Therefore it determined correctly that no block allocations were needed. Next, the rename code deleted the old directory entry prior to replacing it with the new name. Therefore, the soon-to-be replaced directory entry was temporarily made into a directory entry "sentinel" or a place holder at the start of a leaf block. Lastly, it went to re-add the replacement directory entry in that leaf block. However, when gfs2_dirent_find_space was looking for space in the leaf block, it used the wrong value for the sentinel. That threw off its calculations so later it decides it can't really re-use the sentinel and therefore must allocate a new leaf block. But because it previously decided to re-use the directory entry, it didn't waste the time to grab a new block allocation for the inode. Therefore, the inode's i_alloc pointer was still NULL and it crashes trying to reference it. In the case of sentinel directory entries, the entire dirent is reused, not just the "free space" portion of it, and therefore the function gfs2_dirent_find_space should use the value 0 rather than GFS2_DIRENT_SIZE(0) for the actual dirent size. Fixing this calculation enables the reproducer programs to work properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-10xfs: prevent swapext from operating on write-only filesDan Rosenberg
commit 1817176a86352f65210139d4c794ad2d19fc6b63 upstream. This patch prevents user "foo" from using the SWAPEXT ioctl to swap a write-only file owned by user "bar" into a file owned by "foo" and subsequently reading it. It does so by checking that the file descriptors passed to the ioctl are also opened for reading. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-10NFS: kswapd must not block in nfs_release_pageTrond Myklebust
commit b608b283a962caaa280756bc8563016a71712acf upstream See https://bugzilla.kernel.org/show_bug.cgi?id=16056 If other processes are blocked waiting for kswapd to free up some memory so that they can make progress, then we cannot allow kswapd to block on those processes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-10acl trouble after upgrading ubuntuJ. Bruce Fields
commit d327cf7449e6fd5cbac784c641770e9366faa386 upstream. Subject: nfs: fix acl decoding Commit 28f566942c6b1d929f5e240e69e7081b77b238d3 "NFS: use dynamically computed compound_hdr.replen for xdr_inline_pages offset" accidentally changed the amount of space to allow for the acl reply, resulting in an IO error on attempts to get an acl. Reported-by: Paul Rudin <paul@rudin.co.uk> Cc: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jeremy Kerr <jeremy.kerr@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ecryptfs: Bugfix for error related to ecryptfs_hash_bucketsAndre Osterhues
commit a6f80fb7b5986fda663d94079d3bba0937a6b6ff upstream. The function ecryptfs_uid_hash wrongly assumes that the second parameter to hash_long() is the number of hash buckets instead of the number of hash bits. This patch fixes that and renames the variable ecryptfs_hash_buckets to ecryptfs_hash_bits to make it clearer. Fixes: CVE-2010-2492 Signed-off-by: Andre Osterhues <aosterhues@escrypt.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02GFS2: Fix up system xattrsSteven Whitehouse
commit 2646a1f61a3b5525914757f10fa12b5b94713648 upstream. This code has been shamelessly stolen from XFS at the suggestion of Christoph Hellwig. I've not added support for cached ACLs so far... watch for that in a later patch, although this is designed in such a way that they should be easy to add. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Make fsync sync new parent directories in no-journal modeFrank Mayhar
commit 14ece1028b3ed53ffec1b1213ffc6acaf79ad77c upstream (as of v2.6.34-git13) Add a new ext4 state to tell us when a file has been newly created; use that state in ext4_sync_file in no-journal mode to tell us when we need to sync the parent directory as well as the inode and data itself. This fixes a problem in which a panic or power failure may lose the entire file even when using fsync, since the parent directory entry is lost. Addresses-Google-Bug: #2480057 Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fix compat EXT4_IOC_ADD_GROUPBen Hutchings
commit 4d92dc0f00a775dc2e1267b0e00befb783902fe7 upstream (as of v2.6.34-git13) struct ext4_new_group_input needs to be converted because u64 has only 32-bit alignment on some 32-bit architectures, notably i386. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Conditionally define compat ioctl numbersBen Hutchings
commit 899ad0cea6ad7ff4ba24b16318edbc3cbbe03fad upstream (as of v2.6.34-git13) It is unnecessary, and in general impossible, to define the compat ioctl numbers except when building the filesystem with CONFIG_COMPAT defined. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: restart ext4_ext_remove_space() after transaction restartDmitry Monakhov
commit 0617b83fa239db9743a18ce6cc0e556f4d0fd567 upstream (as of v2.6.34-git13) If i_data_sem was internally dropped due to transaction restart, it is necessary to restart path look-up because extents tree was possibly modified by ext4_get_block(). https://bugzilla.kernel.org/show_bug.cgi?id=15827 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warrantedTheodore Ts'o
commit 786ec7915e530936b9eb2e3d12274145cab7aa7d upstream (as of v2.6.34-git13) Dimitry Monakhov discovered an edge case where it was possible for the EXT4_EOFBLOCKS_FL flag could get cleared unnecessarily. This is true; I have a test case that can be exercised via downloading and decompressing the file: wget ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-testcases/eofblocks-fl-test-case.img.bz2 bunzip2 eofblocks-fl-test-case.img dd if=/dev/zero of=eofblocks-fl-test-case.img bs=1k seek=17925 bs=1k count=1 conv=notrunc However, triggering it in real life is highly unlikely since it requires an extremely fragmented sparse file with a hole in exactly the right place in the extent tree. (It actually took quite a bit of work to generate this test case.) Still, it's nice to get even extreme corner cases to be correct, so this patch makes sure that we don't clear the EXT4_EOFBLOCKS_FL incorrectly even in this corner case. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Avoid crashing on NULL ptr dereference on a filesystem errorTheodore Ts'o
commit f70f362b4a6fe47c239dbfb3efc0cc2c10e4f09c upstream (as of v2.6.34-git13) If the EOFBLOCK_FL flag is set when it should not be and the inode is zero length, then eh_entries is zero, and ex is NULL, so dereferencing ex to print ex->ee_block causes a kernel OOPS in ext4_ext_map_blocks(). On top of that, the error message which is printed isn't very helpful. So we fix this by printing something more explanatory which doesn't involve trying to print ex->ee_block. Addresses-Google-Bug: #2655740 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Use bitops to read/modify i_flags in struct ext4_inode_infoDmitry Monakhov
commit 12e9b892002d9af057655d35b44db8ee9243b0dc upstream (as of v2.6.34-git13) At several places we modify EXT4_I(inode)->i_flags without holding i_mutex (ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_flags. So convert handling of i_flags to use bitops which are atomic. https://bugzilla.kernel.org/show_bug.cgi?id=15792 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Show journal_checksum optionJan Kara
commit 39a4bade8c1826b658316d66ee81c09b0a4d7d42 upstream (as of v2.6.34-git13) We failed to show journal_checksum option in /proc/mounts. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: check for a good block group before loading buddy pagesCurt Wohlgemuth
commit 8a57d9d61a6e361c7bb159dda797672c1df1a691 upstream (as of v2.6.34-git13) This adds a new field in ext4_group_info to cache the largest available block range in a block group; and don't load the buddy pages until *after* we've done a sanity check on the block group. With large allocation requests (e.g., fallocate(), 8MiB) and relatively full partitions, it's easy to have no block groups with a block extent large enough to satisfy the input request length. This currently causes the loop during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages for EVERY block group. That can be a lot of pages. The patch below allows us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we have check again after we lock the block group). Addresses-Google-Bug: #2578108 Addresses-Google-Bug: #2704453 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocateNikanth Karthikesan
commit 6d19c42b7cf81c39632b6d4dbc514e8449bcd346 upstream (as of v2.6.34-git13) Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit and create a file larger than the limit. Add a check for that. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Amit Arora <aarora@in.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Remove extraneous newlines in ext4_msg() callsCurt Wohlgemuth
commit fbe845ddf368f77f86aa7500f8fd2690f54c66a8 upstream (as of v2.6.34-git13) Addresses-Google-Bug: #2562325 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: init statistics after journal recoveryDmitry Monakhov
commit 84061e07c5fbbbf9dc8aef8fb750fc3a2dfc31f3 upstream (as of v2.6.34-git13) Currently block/inode/dir counters initialized before journal was recovered. In fact after journal recovery this info will probably change. And freeblocks it critical for correct delalloc mode accounting. https://bugzilla.kernel.org/show_bug.cgi?id=15768 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: clean up inode bitmaps manipulation in ext4_free_inodeDmitry Monakhov
commit d17413c08cd2b1dd2bf2cfdbb0f7b736b2b2b15c upstrea (as of v2..34-git13) - Reorganize locking scheme to batch two atomic operation in to one. This also allow us to state what healthy group must obey following rule ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM); - Fix possible undefined pointer dereference. - Even if group descriptor stats aren't accessible we have to update inode bitmaps. - Move non-group members update out of group_lock. Note: this commit has been observed to fix fs corruption problems under heavy fs load Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Do not zero out uninitialized extents beyond i_sizeDmitry Monakhov
commit 21ca087a3891efab4d45488db8febee474d26c68 upstream (as of v2.6.34-git13) The extents code will sometimes zero out blocks and mark them as initialized instead of splitting an extent into several smaller ones. This optimization however, causes problems if the extent is beyond i_size because fsck will complain if there are uninitialized blocks after i_size as this can not be distinguished from an inode that has an incorrect i_size field. https://bugzilla.kernel.org/show_bug.cgi?id=15742 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: don't scan/accumulate more pages than mballoc will allocateEric Sandeen
commit c445e3e0a5c2804524dec6e55f66d63f6bc5bc3e upstream (as of v2.6.34-git13) There was a bug reported on RHEL5 that a 10G dd on a 12G box had a very, very slow sync after that. At issue was the loop in write_cache_pages scanning all the way to the end of the 10G file, even though the subsequent call to mpage_da_submit_io would only actually write a smallish amt; then we went back to the write_cache_pages loop ... wasting tons of time in calling __mpage_da_writepage for thousands of pages we would just revisit (many times) later. Upstream it's not such a big issue for sys_sync because we get to the loop with a much smaller nr_to_write, which limits the loop. However, talking with Aneesh he realized that fsync upstream still gets here with a very large nr_to_write and we face the same problem. This patch makes mpage_add_bh_to_extent stop the loop after we've accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately causes the write_cache_pages loop to break. Repeating the test with a dirty_ratio of 80 (to leave something for fsync to do), I don't see huge IO performance gains, but the reduction in cpu usage is striking: 80% usage with stock, and 2% with the below patch. Instrumenting the loop in write_cache_pages clearly shows that we are wasting time here. Eventually we need to change mpage_da_map_pages() also submit its I/O to the block layer, subsuming mpage_da_submit_io(), and then change it call ext4_get_blocks() multiple times. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: stop issuing discards if not supported by deviceEric Sandeen
commit a30eec2a8650a77f754e84b2e15f062fe652baa7 upstream (as of v2.6.34-git13) Turn off issuance of discard requests if the device does not support it - similar to the action we take for barriers. This will save a little computation time if a non-discardable device is mounted with -o discard, and also makes it obvious that it's not doing what was asked at mount time ... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: don't return to userspace after freezing the fs with a mutex heldEric Sandeen
commit 6b0310fbf087ad6e9e3b8392adca97cd77184084 upstream (as of v2.6.34-git13) ext4_freeze() used jbd2_journal_lock_updates() which takes the j_barrier mutex, and then returns to userspace. The kernel does not like this: ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ lvcreate/1075 is leaving the kernel with locks still held! 1 lock held by lvcreate/1075: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>] jbd2_journal_lock_updates+0xe1/0xf0 Use vfs_check_frozen() added to ext4_journal_start_sb() and ext4_force_commit() instead. Addresses-Red-Hat-Bugzilla: #568503 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: check s_log_groups_per_flex in online resize codeEric Sandeen
commit 42007efd569f1cf3bfb9a61da60ef6c2179508ca upstream (as of v2.6.34-git13) If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out, and every other access to this first tests s_log_groups_per_flex; same thing needs to happen in resize or we'll wander off into a null pointer when doing an online resize of the file system. Thanks to Christoph Biedl, who came up with the trivial testcase: # truncate --size 128M fsfile # mkfs.ext3 -F fsfile # tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile # e2fsck -yDf -C0 fsfile # truncate --size 132M fsfile # losetup /dev/loop0 fsfile # mount /dev/loop0 mnt # resize2fs -p /dev/loop0 https://bugzilla.kernel.org/show_bug.cgi?id=13549 Reported-by: Alessandro Polverini <alex@nibbles.it> Test-case-by: Christoph Biedl <bugzilla.kernel.bpeb@manchmal.in-ulm.de> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: fix quota accounting in case of fallocateDmitry Monakhov
commit 35121c9860316d7799cea0fbc359a9186e7c2747 upstream (as of v2.6.34-git13) allocated_meta_data is already included in 'used' variable. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat modeChristian Borntraeger
commit b684b2ee9409f2890a8b3aea98525bbe5f84e276 upstream (as of v2.6.34-git13) I have an x86_64 kernel with i386 userspace. e4defrag fails on the EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat case. It seems that struct move_extent is compat save, only types with fixed widths are used: { __u32 reserved; /* should be zero */ __u32 donor_fd; /* donor file descriptor */ __u64 orig_start; /* logical start offset in block for orig */ __u64 donor_start; /* logical start offset in block for donor */ __u64 len; /* block length to be moved */ __u64 moved_len; /* moved block length */ }; Lets just wire up EXT4_IOC_MOVE_EXT for the compat case. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> CC: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy()Jing Zhang
commit e39e07fdfd98be8650385f12a7b81d6adc547510 upstream (as of v2.6.34-git13) This function cleans up after ext4_mb_load_buddy(), so the renaming makes the code clearer. Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Remove unnecessary call to ext4_get_group_desc() in mballocJing Zhang
commit 62e823a2cba18509ee826d775270e8ef9071b5bc upstream (as of v2.6.34-git13) Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: fix memory leaks in error path handling of ext4_ext_zeroout()Jing Zhang
commit b720303df7352d4a7a1f61e467e0a124913c0d41 upstream (as of v2.6.34-git13) When EIO occurs after bio is submitted, there is no memory free operation for bio, which results in memory leakage. And there is also no check against bio_alloc() for bio. Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: check missed return value in ext4_sync_file()Dmitry Monakhov
commit 0671e704658b9f26f85e78d51176daa861f955c7 upstream (as of v2.6.34-git13) Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Issue the discard operation *before* releasing the blocks to be reusedTheodore Ts'o
commit b90f687018e6d6c77d981b09203780f7001407e5 upstream (as of v2.6.34-rc6) Otherwise, we can end up having data corruption because the blocks could get reused and then discarded! https://bugzilla.kernel.org/show_bug.cgi?id=15579 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fix buffer head leaks after calls to ext4_get_inode_loc()Curt Wohlgemuth
commit fd2dd9fbaf9e498ec63eef298921e36556f7214c upstream (as of v2.6.34-rc6) Calls to ext4_get_inode_loc() returns with a reference to a buffer head in iloc->bh. The callers of this function in ext4_write_inode() when in no journal mode and in ext4_xattr_fiemap() don't release the buffer head after using it. Addresses-Google-Bug: #2548165 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fix possible lost inode write in no journal modeCurt Wohlgemuth
commit 8b472d739b2ddd8ab7fb278874f696cd95b25a5e upstream (as of v2.6.34-rc6) In the no-journal case, ext4_write_inode() will fetch the bh and call sync_dirty_buffer() on it. However, if the bh has already been written and the bh reclaimed for some other purpose, AND if the inode is the only one in the inode table block in use, then ext4_get_inode_loc() will not read the inode table block from disk, but as an optimization, fill the block with zero's assuming that its caller will copy in the on-disk version of the inode. This is not done by ext4_write_inode(), so the contents of the inode can simply get lost. The fix is to use __ext4_get_inode_loc() with in_mem set to 0, instead of ext4_get_inode_loc(). Long term the API needs to be fixed so it's obvious why latter is not safe. Addresses-Google-Bug: #2526446 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fixed inode allocator to correctly track a flex_bg's used_dirsEric Sandeen
commit c4caae25187ff3f5e837c6f04eb1acc2723c72d3 upstream (as of v2.6.34-rc3) When used_dirs was introduced for the flex_groups struct, it looks like the accounting was not put into place properly, in some places manipulating free_inodes rather than used_dirs. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fix estimate of # of blocks needed to write indirect-mapped filesJan Kara
commit d330a5befb88875a9b3d2db62f9b74dadf660b13 upstream (as of v2.6.34-rc3) http://bugzilla.kernel.org/show_bug.cgi?id=15420 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Code cleanup for EXT4_IOC_MOVE_EXT ioctlAkira Fujita
commit c437b2733520599a2c6e0dbcdeae611319f84707 upstream (as of v2.6.33-git11) a) Fix sparse warning in ext4_ioctl() b) Remove unneeded variable in mext_leaf_block() c) Fix spelling typo in mext_check_arguments() Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fix the NULL reference in double_down_write_data_sem()Akira Fujita
commit 7247c0caa23d94a1cb6b307edba9dc45fb0798d4 upstream (as of v2.6.33-git11) If EXT4_IOC_MOVE_EXT ioctl is called with NULL donor_fd, fget() in ext4_ioctl() gets inappropriate file structure for donor; so we need to do this check earlier, before calling double_down_write_data_sem(). Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fix insertion point of extent in mext_insert_across_blocks()Akira Fujita
commit 5fd5249aa36fad98c9fd5edced352939e54f9324 upstream (as of v2.6.33-git11) If the leaf node has 2 extent space or fewer and EXT4_IOC_MOVE_EXT ioctl is called with the file offset where after the 2nd extent covers, mext_insert_across_blocks() always tries to insert extent into the first extent. As a result, the file gets corrupted because of wrong extent order. The patch fixes this problem. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: make "offset" consistent in ext4_check_dir_entry()Toshiyuki Okajima
commit b8b8afe236e97b6359d46d3a3f8c46455e192271 upstream (as of v2.6.33-git11) The callers of ext4_check_dir_entry() usually pass in the "file offset" (ext4_readdir, htree_dirblock_to_tree, search_dirblock, ext4_dx_find_entry, empty_dir), but a few callers (add_dirent_to_buf, ext4_delete_entry) only pass in the buffer offset. To accomodate those last two (which would be hard to fix otherwise), this patch changes ext4_check_dir_entry() to print the physical block number and the relative offset as well as the passed-in offset. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Handle non empty on-disk orphan linkDmitry Monakhov
commit 6e3617e579e070d3655a93ee9ed7149113e795e0 upstream (as of v2.6.33-git11) In case of truncate errors we explicitly remove inode from in-core orphan list via orphan_del(NULL, inode) without modifying the on-disk list. But later on, the same inode may be inserted in the orphan list again which will result the on-disk linked list getting corrupted. If inode i_dtime contains valid value, then skip on-disk list modification. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: explicitly remove inode from orphan list after failed direct ioDmitry Monakhov
commit da1dafca84413145f5ac59998b4cdd06fb89f721 upstream (as of v2.6.33-git11) Otherwise non-empty orphan list will be triggered on umount. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: fix error handling in migrateDmitry Monakhov
commit f39490bcd1691d65dc33689222a12e1fc13dd824 upstream (as of v2.6.33-git11) Set i_nlink to zero for temporary inode from very beginning. otherwise we may fail to start new journal handle and this inode will be unreferenced but with i_nlink == 1 Since we hold inode reference it can not be pruned. Also add missed journal_start retval check. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fix fencepost error in chosing choosing group vs file preallocation.Tao Ma
commit cc483f102c3f703e853c96f95a654f0106fb2603 upstream (as of v2.6.33-git11) The ext4 multiblock allocator decides whether to use group or file preallocation based on the file size. When the file size reaches s_mb_stream_request (default is 16 blocks), it changes to use a file-specific preallocation. This is cool, but it has a tiny problem. See a simple script: mkfs.ext4 -b 1024 /dev/sda8 1000000 mount -t ext4 -o nodelalloc /dev/sda8 /mnt/ext4 for((i=0;i<5;i++)) do cat /mnt/4096>>/mnt/ext4/a #4096 is a file with 4096 characters. cat /mnt/4096>>/mnt/ext4/b done debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1 And you get BLOCKS: (0-14):8705-8719, (15):2356, (16-19):8465-8468 So there are 3 extents, a bit strange for the lonely 15th logical block. As we write to the 16 blocks, we choose file preallocation in ext4_mb_group_or_file, but in ext4_mb_normalize_request, we meet with the 16*1024 range, so no preallocation will be carried. file b then reserves the space after '2356', so when when write 16, we start from another part. This patch just change the check in ext4_mb_group_or_file, so that for the lonely 15 we will still use group preallocation. After the patch, we will get: debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1 BLOCKS: (0-15):8705-8720, (16-19):8465-8468 Looks more sane. Thanks. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Add flag to files with blocks intentionally past EOFJiaying Zhang
commit c8d46e41bc744c8fa0092112af3942fcd46c8b18 upstream (as of v2.6.33-git11) fallocate() may potentially instantiate blocks past EOF, depending on the flags used when it is called. e2fsck currently has a test for blocks past i_size, and it sometimes trips up - noticeably on xfstests 013 which runs fsstress. This patch from Jiayang does fix it up - it (along with e2fsprogs updates and other patches recently from Aneesh) has survived many fsstress runs in a row. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fix BUG_ON at fs/buffer.c:652 in no journal modeCurt Wohlgemuth
commit 73b50c1c92666d326b5fa2c945d46509f2f6d91f upstream (as of v2.6.33-git11) Calls to ext4_handle_dirty_metadata should only pass in an inode pointer for inode-specific metadata, and not for shared metadata blocks such as inode table blocks, block group descriptors, the superblock, etc. The BUG_ON can get tripped when updating a special device (such as a block device) that is opened (so that i_mapping is set in fs/block_dev.c) and the file system is mounted in no journal mode. Addresses-Google-Bug: #2404870 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Use bitops to read/modify EXT4_I(inode)->i_stateTheodore Ts'o
commit 19f5fb7ad679bb361222c7916086435020c37cce upstream (as of v2.6.33-git11) At several places we modify EXT4_I(inode)->i_state without holding i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage, ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_state. So convert handling of i_state to use bitops which are atomic. Cc: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flagAneesh Kumar K.V
commit 1296cc85c26e94eb865d03f82140f27d598de467 upstream (as of v2.6.33-rc6) We should update reserve space if it is delalloc buffer and that is indicated by EXT4_GET_BLOCKS_DELALLOC_RESERVE flag. So use EXT4_GET_BLOCKS_DELALLOC_RESERVE in place of EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE [ Stable note: This fixes a corruption cuased by the following reproduction case: rm -f $TEST_FN touch $TEST_FN fallocate -n -o 656712 -l 858907 $TEST_FN dd if=/dev/zero of=$TEST_FN conv=notrunc bs=1 seek=1011020 count=36983 sync dd if=/dev/zero of=$TEST_FN conv=notrunc bs=1 seek=332121 count=24005 dd if=/dev/zero of=$TEST_FN conv=notrunc bs=1 seek=1040179 count=93319 If the filesystem is then unmounted and e2fsck run forced, the i_blocks field for the file $TEST_FN will be found to be incorrect. ] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02ext4: Fix quota accounting error with fallocateAneesh Kumar K.V
commit 5f634d064c709ea02c3cdaa850a08323a4a4bf28 upstream (as of v2.6.33-rc6) When we fallocate a region of the file which we had recently written, and which is still in the page cache marked as delayed allocated blocks we need to make sure we don't do the quota update on writepage path. This is because the needed quota updated would have already be done by fallocate. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>