summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2013-06-10btrfs: use rcu_barrier() to wait for bdev puts at unmountEric Sandeen
commit bc178622d40d87e75abc131007342429c9b03351 upstream. Doing this would reliably fail with -EBUSY for me: # mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2 ... unable to open /dev/sdb2: Device or resource busy because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it. Using systemtap to track bdev gets & puts shows a kworker thread doing a blkdev put after mkfs attempts a get; this is left over from the unmount path: btrfs_close_devices __btrfs_close_devices call_rcu(&device->rcu, free_device); free_device INIT_WORK(&device->rcu_work, __free_device); schedule_work(&device->rcu_work); so unmount might complete before __free_device fires & does its blkdev_put. Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait until all blkdev_put()s are done, and the device is truly free once unmount completes. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2012-10-07Btrfs: call the ordered free operation without any locks heldChris Mason
commit e9fbcb42201c862fd6ab45c48ead4f47bb2dea9d upstream. Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2011-05-09btrfs: Require CAP_SYS_ADMIN for filesystem rebalanceBen Hutchings
commit 6f88a4403def422bd8e276ddf6863d6ac71435d2 upstream. Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire filesystem and may run uninterruptibly for a long time. This does not seem to be something that an unprivileged user should be able to do. Reported-by: Aron Xu <happyaron.xu@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14Btrfs: Fix uninitialized root flags for subvolumesLi Zefan
commit 08fe4db170b4193603d9d31f40ebaf652d07ac9c upstream. root_item->flags and root_item->byte_limit are not initialized when a subvolume is created. This bug is not revealed until we added readonly snapshot support - now you mount a btrfs filesystem and you may find the subvolumes in it are readonly. To work around this problem, we steal a bit from root_item->inode_item->flags, and use it to indicate if those fields have been properly initialized. When we read a tree root from disk, we check if the bit is set, and if not we'll set the flag and initialize the two fields of the root item. Reported-by: Andreas Philipp <philipp.andreas@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Tested-by: Andreas Philipp <philipp.andreas@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: kfree correct pointer during mount option parsingJosef Bacik
commit da495ecc0fb096b383754952a1c152147bc95b52 upstream. We kstrdup the options string, but then strsep screws with the pointer, so when we kfree() it, we're not giving it the right pointer. Tested-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: btrfs_mark_extent_written uses the wrong slotShaohua Li
commit 3f6fae9559225741c91f1320090b285da1413290 upstream. My test do: fallocate a big file and do write. The file is 512M, but after file write is done btrfs-debug-tree shows: item 6 key (257 EXTENT_DATA 0) itemoff 3516 itemsize 53 extent data disk byte 1103101952 nr 536870912 extent data offset 0 nr 399634432 ram 536870912 extent compression 0 Looks like a regression introducted by 6c7d54ac87f338c479d9729e8392eca3f76e11e1, where we set wrong slot. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: apply updated fallocate i_size fixAneesh Kumar K.V
commit 23b5c50945f2294add0137799400329c0ebba290 upstream. This version of the i_size fix for fallocate makes sure we only update the i_size when the current fallocate is really operating outside of i_size. Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: do not try and lookup the file extent when finishing ordered ioJosef Bacik
commit efd049fb26a162c3830fd3cb1001fdc09b147f3b upstream. When running the following fio job [torrent] filename=torrent-test rw=randwrite size=4g filesize=4g bs=4k ioengine=sync you would see long stalls where no work was being done. That is because we were doing all this extra work to read in the file extent outside of the transaction, however in the random io case this ends up hurting us because the file extents are not there to begin with. So axe this logic, since we end up reading in the file extent when we go to update it anyway. This took the fio job from 11 mb/s with several ~10 second stalls to 24 mb/s to a couple of 1-2 second stalls. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Fix oopsen when dropping empty tree.Yan, Zheng
commit 7a7965f83e89f0be506a96769938a721e4e5ae50 upstream. When dropping a empty tree, walk_down_tree() skips checking extent information for the tree root. This will triggers a BUG_ON in walk_up_proc(). Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: remove BUG_ON() due to mounting bad filesystemMiao Xie
commit d7ce5843bb28ada6845ab2ae8510ba3f12d33154 upstream. Mounting a bad filesystem caused a BUG_ON(). The following is steps to reproduce it. # mkfs.btrfs /dev/sda2 # mount /dev/sda2 /mnt # mkfs.btrfs /dev/sda1 /dev/sda2 (the program says that /dev/sda2 was mounted, and then exits. ) # umount /mnt # mount /dev/sda1 /mnt At the third step, mkfs.btrfs exited in the way of make filesystem. So the initialization of the filesystem didn't finish. So the filesystem was bad, and it caused BUG_ON() when mounting it. But BUG_ON() should be called by the wrong code, not user's operation, so I think it is a bug of btrfs. This patch fixes it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: make error return negative in btrfs_sync_file()Roel Kluin
commit 014e4ac4f7d9c981750491fa40ea35efadc9ed49 upstream. It appears the error return should be negative Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: fix race between allocate and release extent buffer.Yan, Zheng
commit f044ba7835b84e69c68b620ca8fa27e5ef67759d upstream. Increase extent buffer's reference count while holding the lock. Otherwise it can race with try_release_extent_buffer. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: check total number of devices when removing missingJosef Bacik
commit 035fe03a7ad56982b30ab3a522b7b08d58feccd0 upstream. If you have a disk failure in RAID1 and then add a new disk to the array, and then try to remove the missing volume, it will fail. The reason is the sanity check only looks at the total number of rw devices, which is just 2 because we have 2 good disks and 1 bad one. Instead check the total number of devices in the array to make sure we can actually remove the device. Tested this with a failed disk setup and with this test we can now run btrfs-vol -r missing /mount/point and it works fine. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: check return value of open_bdev_exclusive properlyJosef Bacik
commit 7f59203abeaf18bf3497b308891f95a4489810ad upstream. Hit this problem while testing RAID1 failure stuff. open_bdev_exclusive returns ERR_PTR(), not NULL. So change the return value properly. This is important if you accidently specify a device that doesn't exist when trying to add a new device to an array, you will panic the box dereferencing bdev. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: do not mark the chunk as readonly if in degraded modeJosef Bacik
commit f48b90756bd834dda852ff514f2690d3175b1f44 upstream. If a RAID setup has chunks that span multiple disks, and one of those disks has failed, btrfs_chunk_readonly will return 1 since one of the disks in that chunk's stripes is dead and therefore not writeable. So instead if we are in degraded mode, return 0 so we can go ahead and allocate stuff. Without this patch all of the block groups in a RAID1 setup will end up read-only, which will mean we can't add new disks to the array since we won't be able to make allocations. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: run orphan cleanup on default fs rootJosef Bacik
commit e3acc2a6850efff647f1c5458524eb3a8bcba20a upstream. This patch revert's commit 6c090a11e1c403b727a6a8eff0b97d5fb9e95cb5 Since it introduces this problem where we can run orphan cleanup on a volume that can have orphan entries re-added. Instead of my original fix, Yan Zheng pointed out that we can just revert my original fix and then run the orphan cleanup in open_ctree after we look up the fs_root. I have tested this with all the tests that gave me problems and this patch fixes both problems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: fix a memory leak in btrfs_init_aclYang Hongyang
commit f858153c367a397235d3e81136741e40e44faf1d upstream. In btrfs_init_acl() cloned acl is not released Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Use correct values when updating inode i_size on fallocateAneesh Kumar K.V
commit d1ea6a61454e7d7ff0873d0ad1ae27d5807da0d3 upstream. commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Jan 20 12:57:53 2010 +0530 Btrfs: Use correct values when updating inode i_size on fallocate Even though we allocate more, we should be updating inode i_size as per the arguments passed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: fix possible panic on unmountJosef Bacik
commit 11dfe35a0108097f2df1f042c485fa7f758c2cdf upstream. We can race with the unmount of an fs and the stopping of a kthread where we will free the block group before we're done using it. The reason for this is because we do not hold a reference on the block group while its caching, since the allocator drops its reference once it exits or moves on to the next block group. This patch fixes the problem by taking a reference to the block group before we start caching and dropping it when we're done to make sure all accesses to the block group are safe. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: deal with NULL acl sent to btrfs_set_aclChris Mason
commit a9cc71a60c29a09174bee2fcef8f924c529fd4b7 upstream. It is legal for btrfs_set_acl to be sent a NULL acl. This makes sure we don't dereference it. A similar patch was sent by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: fix regression in orphan cleanupJosef Bacik
commit 6c090a11e1c403b727a6a8eff0b97d5fb9e95cb5 upstream. Currently orphan cleanup only ever gets triggered if we cross subvolumes during a lookup, which means that if we just mount a plain jane fs that has orphans in it, they will never get cleaned up. This results in panic's like these http://www.kerneloops.org/oops.php?number=1109085 where adding an orphan entry results in -EEXIST being returned and we panic. In order to fix this, we check to see on lookup if our root has had the orphan cleanup done, and if not go ahead and do it. This is easily reproduceable by running this testcase #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <unistd.h> #include <stdio.h> int main(int argc, char **argv) { char data[4096]; char newdata[4096]; int fd1, fd2; memset(data, 'a', 4096); memset(newdata, 'b', 4096); while (1) { int i; fd1 = creat("file1", 0666); if (fd1 < 0) break; for (i = 0; i < 512; i++) write(fd1, data, 4096); fsync(fd1); close(fd1); fd2 = creat("file2", 0666); if (fd2 < 0) break; ftruncate(fd2, 4096 * 512); for (i = 0; i < 512; i++) write(fd2, newdata, 4096); close(fd2); i = rename("file2", "file1"); unlink("file1"); } return 0; } and then pulling the power on the box, and then trying to run that test again when the box comes back up. I've tested this locally and it fixes the problem. Thanks to Tomas Carnecky for helping me track this down initially. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Fix race in btrfs_mark_extent_writtenYan, Zheng
commit 6c7d54ac87f338c479d9729e8392eca3f76e11e1 upstream. Fix bug reported by Johannes Hirte. The reason of that bug is btrfs_del_items is called after btrfs_duplicate_item and btrfs_del_items triggers tree balance. The fix is check that case and call btrfs_search_slot when needed. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs, fix memory leaks in error pathsJiri Slaby
commit 2423fdfb96e3f9ff3baeb6c4c78d74145547891d upstream. Stanse found 2 memory leaks in relocate_block_group and __btrfs_map_block. cluster and multi are not freed/assigned on all paths. Fix that. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: align offsets for btrfs_ordered_update_i_sizeYan, Zheng
commit a038fab0cb873c75d6675e2bcffce8a3935bdce7 upstream. Some callers of btrfs_ordered_update_i_size can now pass in a NULL for the ordered extent to update against. This makes sure we properly align the offset they pass in when deciding how much to bump the on disk i_size. Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13btrfs: fix missing last-entry in readdir(3)Jan Engelhardt
commit 406266ab9ac8ed8b085c58aacd9e3161480dc5d5 upstream. parent 49313cdac7b34c9f7ecbb1780cfc648b1c082cd7 (v2.6.32-1-g49313cd) commit ff48c08e1c05c67e8348ab6f8a24de8034e0e34d Author: Jan Engelhardt <jengelh@medozas.de> Date: Wed Dec 9 22:57:36 2009 +0100 Btrfs: fix missing last-entry in readdir(3) When one does a 32-bit readdir(3), the last entry of a directory is missing. This is however not due to passing a large value to filldir, but it seems to have to do with glibc doing telldir or something quirky. In any case, this patch fixes it in practice. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: make sure fallocate properly starts a transactionChris Mason
commit 3a1abec9f6880cf406593c392636199ea1c6c917 upstream. The recent patch to make fallocate enospc friendly would send down a NULL trans handle to the allocator. This moves the transaction start to properly fix things. Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: make metadata chunks smallerJosef Bacik
commit 83d3c9696fed237a3d96fce18299e2fcf112109f upstream. This patch makes us a bit less zealous about making sure we have enough free metadata space by pearing down the size of new metadata chunks to 256mb instead of 1gb. Also, we used to try an allocate metadata chunks when allocating data, but that sort of thing is done elsewhere now so we can just remove it. With my -ENOSPC test I used to have 3gb reserved for metadata out of 75gb, now I have 1.7gb. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Show discard option in /proc/mountsMatthew Wilcox
commit 20a5239a5d0f340e29827a6a2d28a138001c44b8 upstream. Christoph's patch e244a0aeb6a599c19a7c802cda6e2d67c847b154 doesn't display the discard option in /proc/mounts, leading to some confusion for me. Here's the missing bit. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: deny sys_link across subvolumes.TARUISI Hiroaki
commit 4a8be425a8fb8fbb5d881eb55fa6634c3463b9c9 upstream. I rebased Christian Parpart's patch to deny hard link across subvolumes. Original patch modifies also btrfs_rename, but I excluded it because we can move across subvolumes now and it make no problem. ----------------- Hard link across subvolumes should not allowed in Btrfs. btrfs_link checks root of 'to' directory is same as root of 'from' file. If not same, btrfs_link returns -EPERM. Signed-off-by: TARUISI Hiroaki <taruishi.hiroak@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: fail mount on bad mount optionsSage Weil
commit a7a3f7cadd9bdee569243f7ead9550aa16b60e07 upstream. We shouldn't silently ignore unrecognized options. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: don't add extent 0 to the free space cache v2Yan, Zheng
commit 06b2331f8333ec6edf41662757ce8882cc1747d5 upstream. If block group 0 is completely free, btrfs_read_block_groups will add extent [0, BTRFS_SUPER_INFO_OFFSET) to the free space cache. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Fix per root used space accountingYan, Zheng
commit 86b9f2eca5e0984145e3c7698a7cd6dd65c2a93f upstream. The bytes_used field in root item was originally planned to trace the amount of used data and tree blocks. But it never worked right since we can't trace freeing of data accurately. This patch changes it to only trace the amount of tree blocks. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Fix btrfs_drop_extent_cache for skip pinned caseYan, Zheng
commit 55ef68990029fcd8d04d42fc184aa7fb18cf309e upstream. The check for skip pinned case is wrong, it may breaks the while loop too soon. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Add delayed iputYan, Zheng
commit 24bbcf0442ee04660a5a030efdbb6d03f1c275cb upstream. iput() can trigger new transactions if we are dropping the final reference, so calling it in btrfs_commit_transaction may end up deadlock. This patch adds delayed iput to avoid the issue. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Pass transaction handle to security and ACL initialization functionsYan, Zheng
commit f34f57a3ab4e73304d78c125682f1a53cd3975f2 upstream. Pass transaction handle down to security and ACL initialization functions, so we can avoid starting nested transactions Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Make truncate(2) more ENOSPC friendlyYan, Zheng
commit 8082510e7124cc50d728f1b875639cb4e22312cc upstream. truncating and deleting regular files are unbound operations, so it's not good to do them in a single transaction. This patch makes btrfs_truncate and btrfs_delete_inode start a new transaction after all items in a tree leaf are deleted. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Make fallocate(2) more ENOSPC friendlyYan, Zheng
commit 5a303d5d4b8055d2e5a03e92d04745bfc5881a22 upstream. fallocate(2) may allocate large number of file extents, so it's not good to do it in a single transaction. This patch make fallocate(2) start a new transaction for each file extents it allocates. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Avoid orphan inodes cleanup during committing transactionYan, Zheng
commit 2e4bfab97055aa6acdd0637913bd705c2d6506d6 upstream. btrfs_lookup_dentry may trigger orphan cleanup, so it's not good to call it while committing a transaction. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Avoid orphan inodes cleanup while replaying logYan, Zheng
commit c71bf099abddf3e0fdc27f251ba76fca1461d49a upstream. We do log replay in a single transaction, so it's not good to do unbound operations. This patch cleans up orphan inodes cleanup after replaying the log. It also avoids doing other unbound operations such as truncating a file during replaying log. These unbound operations are postponed to the orphan inode cleanup stage. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Fix disk_i_size update corner caseYan, Zheng
commit c216775458a2ee345d9412a2770c2916acfb5d30 upstream. There are some cases file extents are inserted without involving ordered struct. In these cases, we update disk_i_size directly, without checking pending ordered extent and DELALLOC bit. This patch extends btrfs_ordered_update_i_size() to handle these cases. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Rewrite btrfs_drop_extentsYan, Zheng
commit 920bbbfb05c9fce22e088d20eb9dcb8f96342de9 upstream. Rewrite btrfs_drop_extents by using btrfs_duplicate_item, so we can avoid calling lock_extent within transaction. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Add btrfs_duplicate_itemYan, Zheng
commit ad48fd754676bfae4139be1a897b1ea58f9aaf21 upstream. btrfs_duplicate_item duplicates item with new key, guaranteeing the source item and the new items are in the same tree leaf and contiguous. It allows us to split file extent in place, without using lock_extent to prevent bookend extent race. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13Btrfs: Avoid superfluous tree-log writeoutYan, Zheng
commit 8cef4e160d74920ad1725f58c89fd75ec4c4ac38 upstream. We allow two log transactions at a time, but use same flag to mark dirty tree-log btree blocks. So we may flush dirty blocks belonging to newer log transaction when committing a log transaction. This patch fixes the issue by using two flags to mark dirty tree-log btree blocks. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02Btrfs: fix checks in BTRFS_IOC_CLONE_RANGEDan Rosenberg
commit 2ebc3464781ad24474abcbd2274e6254689853b5 upstream. 1. The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check whether the donor file is append-only before writing to it. 2. The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer overflow that allows a user to specify an out-of-bounds range to copy from the source file (if off + len wraps around). I haven't been able to successfully exploit this, but I'd imagine that a clever attacker could use this to read things he shouldn't. Even if it's not exploitable, it couldn't hurt to be safe. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05Btrfs: should add a permission check for setfaclShi Weihua
commit 2f26afba46f0ebf155cf9be746496a0304a5b7cf upstream. On btrfs, do the following ------------------ # su user1 # cd btrfs-part/ # touch aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rw- group::rw- other::r-- # su user2 # cd btrfs-part/ # setfacl -m u::rwx aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rwx <- successed to setfacl group::rw- other::r-- ------------------ but we should prohibit it that user2 changing user1's acl. In fact, on ext3 and other fs, a message occurs: setfacl: aaa: Operation not permitted This patch fixed it. Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26Btrfs: check for read permission on src file in the clone ioctlDan Rosenberg
commit 5dc6416414fb3ec6e2825fd4d20c8bf1d7fe0395 upstream. The existing code would have allowed you to clone a file that was only open for writing Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix panic when trying to destroy a newly allocated Btrfs: allow more metadata chunk preallocation Btrfs: fallback on uncompressed io if compressed io fails Btrfs: find ideal block group for caching Btrfs: avoid null deref in unpin_extent_cache() Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root Btrfs: fix some metadata enospc issues Btrfs: fix how we set max_size for free space clusters Btrfs: cleanup transaction starting and fix journal_info usage Btrfs: fix data allocation hint start
2009-11-11Btrfs: fix panic when trying to destroy a newly allocatedJosef Bacik
There is a problem where iget5_locked will look for an inode, not find it, and then subsequently try to allocate it. Another CPU will have raced in and allocated the inode instead, so when iget5_locked gets the inode spin lock again and does a search, it finds the new inode. So it goes ahead and calls destroy_inode on the inode it just allocated. The problem is we don't set BTRFS_I(inode)->root until the new inode is completely initialized. This patch makes us set root to NULL when alloc'ing a new inode, so when we get to btrfs_destroy_inode and we see that root is NULL we can just free up the memory and continue on. This fixes the panic http://www.kerneloops.org/submitresult.php?number=812690 Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: allow more metadata chunk preallocationChris Mason
On an FS where all of the space has not been allocated into chunks yet, the enospc can return enospc just because the existing metadata chunks are full. We get around this by allowing more metadata chunks to be allocated up to a certain limit, and finding the right limit is a little fuzzy. The problem is the reservations for delalloc would preallocate way too much of the FS as metadata. We need to start saying no and just force some IO to happen. But we also need to let a reasonable amount of the FS become metadata. This bumps the hard limit up, later releases will have a better system. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: fallback on uncompressed io if compressed io failsJosef Bacik
Currently compressed IO does not deal with not having its entire extent able to be allocated. So if we have enough free space to allocate for the extent, but its not contiguous, it will fail spectacularly. This patch fixes this by falling back on uncompressed IO which lets us spread the delalloc extent across multiple extents. I tested this by making us randomly think the reservation had failed to make it fallback on the uncompressed io way and it seemed to work fine. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>