summaryrefslogtreecommitdiff
path: root/fs/ext4/ialloc.c
AgeCommit message (Collapse)Author
2020-03-11ext4: fix potential race between s_flex_groups online resizing and accessSuraj Jitindar Singh
commit 7c990728b99ed6fbe9c75fc202fce1172d9916da upstream. During an online resize an array of s_flex_groups structures gets replaced so it can get enlarged. If there is a concurrent access to the array and this memory has been reused then this can lead to an invalid memory access. The s_flex_group array has been converted into an array of pointers rather than an array of structures. This is to ensure that the information contained in the structures cannot get out of sync during a resize due to an accessor updating the value in the old structure after it has been copied but before the array pointer is updated. Since the structures them- selves are no longer copied but only the pointers to them this case is mitigated. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443 Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.edu Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.4.x Cc: stable@kernel.org # 4.9.x Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-08-15ext4: fix check to prevent initializing reserved inodesTheodore Ts'o
commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream. Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is valid" will complain if block group zero does not have the EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct, since a freshly created file system has this flag cleared. It gets almost immediately after the file system is mounted read-write --- but the following somewhat unlikely sequence will end up triggering a false positive report of a corrupted file system: mkfs.ext4 /dev/vdc mount -o ro /dev/vdc /vdc mount -o remount,rw /dev/vdc Instead, when initializing the inode table for block group zero, test to make sure that itable_unused count is not too large, since that is the case that will result in some or all of the reserved inodes getting cleared. This fixes the failures reported by Eric Whiteney when running generic/230 and generic/231 in the the nojournal test case. Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid") Reported-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03ext4: check for allocation block validity with block group lockedTheodore Ts'o
commit 8d5a803c6a6ce4ec258e31f76059ea5153ba46ef upstream. With commit 044e6e3d74a3: "ext4: don't update checksum of new initialized bitmaps" the buffer valid bit will get set without actually setting up the checksum for the allocation bitmap, since the checksum will get calculated once we actually allocate an inode or block. If we are doing this, then we need to (re-)check the verified bit after we take the block group lock. Otherwise, we could race with another process reading and verifying the bitmap, which would then complain about the checksum being invalid. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: only look at the bg_flags field if it is validTheodore Ts'o
commit 8844618d8aa7a9973e7b527d038a2a589665002c upstream. The bg_flags field in the block group descripts is only valid if the uninit_bg or metadata_csum feature is enabled. We were not consistently looking at this field; fix this. Also block group #0 must never have uninitialized allocation bitmaps, or need to be zeroed, since that's where the root inode, and other special inodes are set up. Check for these conditions and mark the file system as corrupted if they are detected. This addresses CVE-2018-10876. https://bugzilla.kernel.org/show_bug.cgi?id=199403 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-01ext4: add validity checks for bitmap block numbersTheodore Ts'o
commit 7dac4a1726a9c64a517d595c40e95e2d0d135f6f upstream. An privileged attacker can cause a crash by mounting a crafted ext4 image which triggers a out-of-bounds read in the function ext4_valid_block_bitmap() in fs/ext4/balloc.c. This issue has been assigned CVE-2018-1093. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199181 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1560782 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24ext4: don't update checksum of new initialized bitmapsTheodore Ts'o
commit 044e6e3d74a3d7103a0c8a9305dfd94d64000660 upstream. When reading the inode or block allocation bitmap, if the bitmap needs to be initialized, do not update the checksum in the block group descriptor. That's because we're not set up to journal those changes. Instead, just set the verified bit on the bitmap block, so that it's not necessary to validate the checksum. When a block or inode allocation actually happens, at that point the checksum will be calculated, and update of the bg descriptor block will be properly journalled. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30fscrypt: use ENOKEY when file cannot be created w/o keyEric Biggers
[ Upstream commit 54475f531bb8d7078f63c159e5e0615d486c498c ] As part of an effort to clean up fscrypt-related error codes, make attempting to create a file in an encrypted directory that hasn't been "unlocked" fail with ENOKEY. Previously, several error codes were used for this case, including ENOENT, EACCES, and EPERM, and they were not consistent between and within filesystems. ENOKEY is a better choice because it expresses that the failure is due to lacking the encryption key. It also matches the error code returned when trying to open an encrypted regular file without the key. I am not aware of any users who might be relying on the previous inconsistent error codes, which were never documented anywhere. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-05ext4: remove old feature helpersKaho Ng
Use the ext4_{has,set,clear}_feature_* helpers to replace the old feature helpers. Signed-off-by: Kaho Ng <ngkaho1234@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-07-26Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "The major change this cycle is deleting ext4's copy of the file system encryption code and switching things over to using the copies in fs/crypto. I've updated the MAINTAINERS file to add an entry for fs/crypto listing Jaeguk Kim and myself as the maintainers. There are also a number of bug fixes, most notably for some problems found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also fixed is a writeback deadlock detected by generic/130, and some potential races in the metadata checksum code" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits) ext4: verify extent header depth ext4: short-cut orphan cleanup on error ext4: fix reference counting bug on block allocation error MAINTAINRES: fs-crypto maintainers update ext4 crypto: migrate into vfs's crypto engine ext2: fix filesystem deadlock while reading corrupted xattr block ext4: fix project quota accounting without quota limits enabled ext4: validate s_reserved_gdt_blocks on mount ext4: remove unused page_idx ext4: don't call ext4_should_journal_data() on the journal inode ext4: Fix WARN_ON_ONCE in ext4_commit_super() ext4: fix deadlock during page writeback ext4: correct error value of function verifying dx checksum ext4: avoid modifying checksum fields directly during checksum verification ext4: check for extents that wrap around jbd2: make journal y2038 safe jbd2: track more dependencies on transaction commit jbd2: move lockdep tracking to journal_s jbd2: move lockdep instrumentation for jbd2 handles ext4: respect the nobarrier mount option in nojournal mode ...
2016-07-10ext4 crypto: migrate into vfs's crypto engineJaegeuk Kim
This patch removes the most parts of internal crypto codes. And then, it modifies and adds some ext4-specific crypt codes to use the generic facility. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-06-07fs: have submit_bh users pass in op and flags separatelyMike Christie
This has submit_bh users pass in the operation and flags separately, so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-30ext4: clean up error handling when orphan list is corruptedTheodore Ts'o
Instead of just printing warning messages, if the orphan list is corrupted, declare the file system is corrupted. If there are any reserved inodes in the orphaned inode list, declare the file system corrupted and stop right away to avoid doing more potential damage to the file system. Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-30ext4: fix hang when processing corrupted orphaned inode listTheodore Ts'o
If the orphaned inode list contains inode #5, ext4_iget() returns a bad inode (since the bootloader inode should never be referenced directly). Because of the bad inode, we end up processing the inode repeatedly and this hangs the machine. This can be reproduced via: mke2fs -t ext4 /tmp/foo.img 100 debugfs -w -R "ssv last_orphan 5" /tmp/foo.img mount -o loop /tmp/foo.img /mnt (But don't do this if you are using an unpatched kernel if you care about the system staying functional. :-) This bug was found by the port of American Fuzzy Lop into the kernel to find file system problems[1]. (Since it *only* happens if inode #5 shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not surprising that AFL needed two hours before it found it.) [1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf Cc: stable@vger.kernel.org Reported by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09ext4: fix misspellings in comments.Adam Buchbinder
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-11ext4: fix scheduling in atomic on group checksum failureJan Kara
When block group checksum is wrong, we call ext4_error() while holding group spinlock from ext4_init_block_bitmap() or ext4_init_inode_bitmap() which results in scheduling while in atomic. Fix the issue by calling ext4_error() later after dropping the spinlock. CC: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-01-08ext4: adds project ID supportLi Xi
Signed-off-by: Li Xi <lixi@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz>
2015-10-17ext4: make the bitmap read routines return real error codesDarrick J. Wong
Make the bitmap reaading routines return real error codes (EIO, EFSCORRUPTED, EFSBADCRC) which can then be reflected back to userspace for more precise diagnosis work. In particular, this means that mballoc no longer claims that we're out of memory if the block bitmaps become corrupt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-17ext4: clean up feature test macros with predicate functionsDarrick J. Wong
Create separate predicate functions to test/set/clear feature flags, thereby replacing the wordy old macros. Furthermore, clean out the places where we open-coded feature tests. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2015-10-17ext4: call out CRC and corruption errors with specific error codesDarrick J. Wong
Instead of overloading EIO for CRC errors and corrupt structures, return the same error codes that XFS returns for the same issues. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-07-23ext4: Handle error from dquot_initialize()Jan Kara
dquot_initialize() can now return error. Handle it where possible. Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.com>
2015-05-31ext4 crypto: encrypt tmpfile located in encryption protected directoryTheodore Ts'o
Factor out calls to ext4_inherit_context() and move them to __ext4_new_inode(); this fixes a problem where ext4_tmpfile() wasn't calling calling ext4_inherit_context(), so the temporary file wasn't getting protected. Since the blocks for the tmpfile could end up on disk, they really should be protected if the tmpfile is created within the context of an encrypted directory. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-05-18ext4: clean up superblock encryption mode fieldsTheodore Ts'o
The superblock fields s_file_encryption_mode and s_dir_encryption_mode are vestigal, so remove them as a cleanup. While we're at it, allow file systems with both encryption and inline_data enabled at the same time to work correctly. We can't have encrypted inodes with inline data, but there's no reason to prohibit unencrypted inodes from using the inline data feature. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-16ext4 crypto: enable encryption feature flagTheodore Ts'o
Also add the test dummy encryption mode flag so we can more easily test the encryption patches using xfstests. Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-15VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-12ext4 crypto: enable filename encryptionMichael Halcrow
Signed-off-by: Uday Savagaonkar <savagaon@google.com> Signed-off-by: Ildar Muslukhov <ildarm@google.com> Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12ext4 crypto: implement the ext4 encryption write pathMichael Halcrow
Pulls block_write_begin() into fs/ext4/inode.c because it might need to do a low-level read of the existing data, in which case we need to decrypt it. Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Ildar Muslukhov <ildarm@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-02ext4: remove unused header filesSheng Yong
Remove unused header files and header files which are included in ext4.h. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-30ext4: fix oops when loading block bitmap failedJan Kara
When we fail to load block bitmap in __ext4_new_inode() we will dereference NULL pointer in ext4_journal_get_write_access(). So check for error from ext4_read_block_bitmap(). Coverity-id: 989065 Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-10-13ext4: Replace open coded mdata csum feature to helper functionDmitry Monakhov
Besides the fact that this replacement improves code readability it also protects from errors caused direct EXT4_S(sb)->s_es manipulation which may result attempt to use uninitialized csum machinery. #Testcase_BEGIN IMG=/dev/ram0 MNT=/mnt mkfs.ext4 $IMG mount $IMG $MNT #Enable feature directly on disk, on mounted fs tune2fs -O metadata_csum $IMG # Provoke metadata update, likey result in OOPS touch $MNT/test umount $MNT #Testcase_END # Replacement script @@ expression E; @@ - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) + ext4_has_metadata_csum(E) https://bugzilla.kernel.org/show_bug.cgi?id=82201 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-07-12ext4: fix potential null pointer dereference in ext4_free_inodeNamjae Jeon
Fix potential null pointer dereferencing problem caused by e43bb4e612 ("ext4: decrement free clusters/inodes counters when block group declared bad") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2014-07-05ext4: fix unjournalled bg descriptor while initializing inode bitmapTheodore Ts'o
The first time that we allocate from an uninitialized inode allocation bitmap, if the block allocation bitmap is also uninitalized, we need to get write access to the block group descriptor before we start modifying the block group descriptor flags and updating the free block count, etc. Otherwise, there is the potential of a bad journal checksum (if journal checksums are enabled), and of the file system becoming inconsistent if we crash at exactly the wrong time. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-06-26ext4: decrement free clusters/inodes counters when block group declared badNamjae Jeon
We should decrement free clusters counter when block bitmap is marked as corrupt and free inodes counter when the allocation bitmap is marked as corrupt to avoid misunderstanding due to incorrect available size in statfs result. User can get immediately ENOSPC error from write begin without reaching for the writepages. Cc: Darrick J. Wong<darrick.wong@oracle.com> Reported-by: Amit Sahrawat <amit.sahrawat83@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
2013-11-08ext4: use prandom_u32() instead of get_random_bytes()Theodore Ts'o
Many of the uses of get_random_bytes() do not actually need cryptographically secure random numbers. Replace those uses with a call to prandom_u32(), which is faster and which doesn't consume entropy from the /dev/random driver. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-08-28ext4: mark group corrupt on group descriptor checksumDarrick J. Wong
If the group descriptor fails validation, mark the whole blockgroup corrupt so that the inode/block allocators skip this group. The previous approach takes the risk of writing to a damaged group descriptor; hopefully it was never the case that the [ib]bitmap fields pointed to another valid block and got dirtied, since the memset would fill the page with 1s. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-08-28ext4: mark block group as corrupt on inode bitmap errorDarrick J. Wong
If we detect either a discrepancy between the inode bitmap and the inode counts or the inode bitmap fails to pass validation checks, mark the block group corrupt and refuse to allocate or deallocate inodes from the group. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-08-16ext4: avoid reusing recently deleted inodes in no journal modeTheodore Ts'o
In no journal mode, if an inode has recently been deleted, we shouldn't reuse it right away. Otherwise it's possible, after an unclean shutdown, to hit a situation where a recently deleted inode gets reused for some other purpose before the inode table block has been written to disk. However, if the directory entry has been updated, then the directory entry will be pointing at the old inode contents. E2fsck will make sure the file system is consistent after the unclean shutdown. However, if the recently deleted inode is a character mode device, or an inode with the immutable bit set, even after the file system has been fixed up by e2fsck, it can be possible for a *.pyc file to be pointing at a character mode device, and when python tries to open the *.pyc file, Hilarity Ensues. We could change all of userspace to be very suspicious about stat'ing files before opening them, and clearing the immutable flag if necessary --- or we can just avoid reusing an inode number if it has been recently deleted. Google-Bug-Id: 10017573 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-26ext4: make sure group number is bumped after a inode allocation raceTheodore Ts'o
When we try to allocate an inode, and there is a race between two CPU's trying to grab the same inode, _and_ this inode is the last free inode in the block group, make sure the group number is bumped before we continue searching the rest of the block groups. Otherwise, we end up searching the current block group twice, and we end up skipping searching the last block group. So in the unlikely situation where almost all of the inodes are allocated, it's possible that we will return ENOSPC even though there might be free inodes in that last block group. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-06-04ext4: provide wrappers for transaction reservation callsJan Kara
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-20ext4: mark all metadata I/O with REQ_METATheodore Ts'o
As Dave Chinner pointed out at the 2013 LSF/MM workshop, it's important that metadata I/O requests are marked as such to avoid priority inversions caused by I/O bandwidth throttling. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-19ext4: move quota initialization out of inode allocation transactionJan Kara
Inode allocation transaction is pretty heavy (246 credits with quotas and extents before previous patch, still around 200 after it). This is mostly due to credits required for allocation of quota structures (credits there are heavily overestimated but it's difficult to make better estimates if we don't want to wire non-trivial assumptions about quota format into filesystem). So move quota initialization out of allocation transaction. That way transaction for quota structure allocation will be started only if we need to look up quota structure on disk (rare) and furthermore it will be started for each quota type separately, not for all of them at once. This reduces maximum transaction size to 34 is most cases and to 73 in the worst case. [ Modified by tytso to clean up the cleanup paths for error handling. Also use a separate call to ext4_std_error() for each failure so it is easier for someone who is debugging a problem in this function to determine which function call failed. ] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-09ext4: fix usless declarationsDmitri Monakho
This patch should fix sparse complains about shadow declatations. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-11ext4: use atomic64_t for the per-flexbg free_clusters countTheodore Ts'o
A user who was using a 8TB+ file system and with a very large flexbg size (> 65536) could cause the atomic_t used in the struct flex_groups to overflow. This was detected by PaX security patchset: http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551 This bug was introduced in commit 9f24e4208f7e, so it's been around since 2.6.30. :-( Fix this by using an atomic64_t for struct orlav_stats's free_clusters. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
2013-02-14ext4: use KERN_WARNING for warning messagesTheodore Ts'o
Some messages printed related to a WARN_ON(1) were printed using KERN_NOTICE. Use KERN_WARNING or ext4_warning() instead so that context related to the WARN_ON() is printed at the same printk warning level (and log files, etc.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-09ext4: start handle at the last possible moment when creating inodesTheodore Ts'o
In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle until the inode has been succesfully allocated. In order to do this, we need to start the handle in the ext4_new_inode(). So create a new variant of this function, ext4_new_inode_start_handle(), so the handle can be created at the last possible minute, before we need to modify the inode allocation bitmap block. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-08ext4: pass context information to jbd2__journal_start()Theodore Ts'o
So we can better understand what bits of ext4 are responsible for long-running jbd2 handles, use jbd2__journal_start() so we can pass context information for logging purposes. The recommended way for finding the longer-running handles is: T=/sys/kernel/debug/tracing EVENT=$T/events/jbd2/jbd2_handle_stats echo "interval > 5" > $EVENT/filter echo 1 > $EVENT/enable ./run-my-fs-benchmark cat $T/trace > /tmp/problem-handles This will list handles that were active for longer than 20ms. Having longer-running handles is bad, because a commit started at the wrong time could stall for those 20+ milliseconds, which could delay an fsync() or an O_SYNC operation. Here is an example line from the trace file describing a handle which lived on for 311 jiffies, or over 1.2 seconds: postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32 tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1 dirtied_blocks 0 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: enable ext4 inline supportTao Ma
Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-29ext4: fix possible use after free with metadata csumTheodore Ts'o
Commit fa77dcfafeaa introduces block bitmap checksum calculation into ext4_new_inode() in the case that block group was uninitialized. However we brelse() the bitmap buffer before we attempt to checksum it so we have no guarantee that the buffer is still there. Fix this by releasing the buffer after the possible checksum computation. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: stable@vger.kernel.org
2012-10-28ext4: fix unjournaled inode bitmap modificationEric Sandeen
commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed ext4_new_inode() such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal during log replay. Nix ran into this when using the journal_async_commit mount option, which enables journal checksumming. The ensuing journal replay failures due to the bad checksums led to filesystem corruption reported as the now infamous "Apparent serious progressive ext4 data corruption bug" [ Changed by tytso to only call ext4_journal_get_write_access() only when we're fairly certain that we're going to allocate the inode. ] I've tested this by mounting with journal_checksum and running fsstress then dropping power; I've also tested by hacking DM to create snapshots w/o first quiescing, which allows me to test journal replay repeatedly w/o actually power-cycling the box. Without the patch I hit a journal checksum error every time. With this fix it survives many iterations. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-10-22ext4: Checksum the block bitmap properly with bigalloc enabledTao Ma
In mke2fs, we only checksum the whole bitmap block and it is right. While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the size of the checksumed bitmap which is wrong when we enable bigalloc. The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes it. Also as every caller of ext4_block_bitmap_csum_set and ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8, we'd better removes this parameter and sets it in the function itself. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org