summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2011-08-29Btrfs: fix an oops of log replayliubo
commit 34f3e4f23ca3d259fe078f62a128d97ca83508ef upstream. When btrfs recovers from a crash, it may hit the oops below: ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:4580! [...] RIP: 0010:[<ffffffffa03df251>] [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs] [...] Call Trace: [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs] [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs] [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs] [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs] [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs] [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs] [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs] [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs] [...] This comes from that while replaying an inode ref item, we forget to check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree, then we will come to conflict corners which lead to BUG_ON(). Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Tested-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29Btrfs: detect wether a device supports discardJosef Bacik
commit d5e2003c2bcda93a8f2e668eb4642d70c9c38301 upstream. We have a problem where if a user specifies discard but doesn't actually support it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem because this gets called (in a fashion) from the tree log recovery code, which has a nice little BUG_ON(ret) after it, which causes us to fail the tree log replay. So instead detect wether our devices support discard when we're adding them and then don't issue discards if we know that the device doesn't support it. And just for good measure set ret = 0 in btrfs_issue_discard just in case we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix oops when doing space balance Btrfs: don't panic if we get an error while balancing V2 btrfs: add missing options displayed in mount output
2011-07-06btrfs: fix oops when doing space balanceMiao Xie
We need to make sure the data relocation inode doesn't go through the delayed metadata updates, otherwise we get an oops during balance: kernel BUG at fs/btrfs/relocation.c:4303! [SNIP] Call Trace: [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs] [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs] [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs] [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs] [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs] [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs] [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs] [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs] [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs] [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs] [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs] [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs] [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs] [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs] [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs] [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs] [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs] [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs] [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs] [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b [SNIP] RIP [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-07-06Btrfs: don't panic if we get an error while balancing V2Josef Bacik
A user reported an error where if we try to balance an fs after a device has been removed it will blow up. This is because we get an EIO back and this is where BUG_ON(ret) bites us in the ass. To fix we just exit. Thanks, Reported-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-07-06btrfs: add missing options displayed in mount outputDavid Sterba
There are three missed mount options settable by user which are not currently displayed in mount output. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix inconsonant inode information Btrfs: make sure to update total_bitmaps when freeing cache V3 Btrfs: fix type mismatch in find_free_extent() Btrfs: make sure to record the transid in new inodes
2011-06-27btrfs: fix inconsonant inode informationMiao Xie
When iputting the inode, We may leave the delayed nodes if they have some delayed items that have not been dealt with. So when the inode is read again, we must look up the relative delayed node, and use the information in it to initialize the inode. Or we will get inconsonant inode information, it may cause that the same directory index number is allocated again, and hit the following oops: [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17) [ 5447.569766] ------------[ cut here ]------------ [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301! [SNIP] [ 5447.790721] Call Trace: [ 5447.793191] [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs] [ 5447.800156] [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs] [ 5447.806517] [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs] [ 5447.812876] [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs] [ 5447.818961] [<ffffffff8111f840>] vfs_create+0x72/0x92 [ 5447.824090] [<ffffffff8111fa8c>] do_last+0x22c/0x40b [ 5447.829133] [<ffffffff8112076a>] path_openat+0xc0/0x2ef [ 5447.834438] [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44 [ 5447.841216] [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67 [ 5447.847846] [<ffffffff81121a79>] do_filp_open+0x3d/0x87 [ 5447.853156] [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d [ 5447.859072] [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80 [ 5447.864636] [<ffffffff8111f179>] ? do_getname+0x14b/0x173 [ 5447.870112] [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26 [ 5447.875682] [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10 [ 5447.880882] [<ffffffff81112d39>] do_sys_open+0x69/0xae [ 5447.886153] [<ffffffff81112db1>] sys_open+0x20/0x22 [ 5447.891114] [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b Fix it by reusing the old delayed node. Reported-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-25Btrfs: make sure to update total_bitmaps when freeing cache V3Josef Bacik
A user reported this bug again where we have more bitmaps than we are supposed to. This is because we failed to load the free space cache, but don't update the ctl->total_bitmaps counter when we remove entries from the tree. This patch fixes this problem and we should be good to go again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-25Btrfs: fix type mismatch in find_free_extent()Ilya Dryomov
data parameter should be u64 because a full-sized chunk flags field is passed instead of 0/1 for distinguishing data from metadata. All underlying functions expect u64. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-24Btrfs: make sure to record the transid in new inodesChris Mason
When we create a new inode, we aren't filling in the field that records the transaction that last changed this inode. If we then go to fsync that inode, it will be skipped because the field isn't filled in. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-24Remove unneeded version.h includes from fs/Jesper Juhl
It was pointed out by 'make versioncheck' that some includes of linux/version.h were not needed in fs/ (fs/btrfs/ctree.h and fs/omfs/file.c). This patch removes them. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: avoid delayed metadata items during commits btrfs: fix uninitialized return value btrfs: fix wrong reservation when doing delayed inode operations btrfs: Remove unused sysfs code btrfs: fix dereference of ERR_PTR value Btrfs: fix relocation races Btrfs: set no_trans_join after trying to expand the transaction Btrfs: protect the pending_snapshots list with trans_lock Btrfs: fix path leakage on subvol deletion Btrfs: drop the delalloc_bytes check in shrink_delalloc Btrfs: check the return value from set_anon_super
2011-06-17Btrfs: avoid delayed metadata items during commitsChris Mason
Snapshot creation has two phases. One is the initial snapshot setup, and the second is done during commit, while nobody is allowed to modify the root we are snapshotting. The delayed metadata insertion code can break that rule, it does a delayed inode update on the inode of the parent of the snapshot, and delayed directory item insertion. This makes sure to run the pending delayed operations before we record the snapshot root, which avoids corruptions. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17btrfs: fix uninitialized return valueDavid Sterba
When allocation fails in btrfs_read_fs_root_no_name, ret is not set although it is returned, holding a garbage value. Signed-off-by: David Sterba <dsterba@suse.cz> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17btrfs: fix wrong reservation when doing delayed inode operationsMiao Xie
We have migrated the space for the delayed inode items from trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to global_block_rsv when we doing delayed inode operations, and the following Oops happened: [ 9792.654889] ------------[ cut here ]------------ [ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681 btrfs_alloc_free_block+0xca/0x27c [btrfs]() [ 9792.654899] Hardware name: To Be Filled By O.E.M. [ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] [ 9792.654919] Pid: 2762, comm: rm Tainted: G W 2.6.39+ #1 [ 9792.654920] Call Trace: [ 9792.654922] [<ffffffff81053c4a>] warn_slowpath_common+0x83/0x9b [ 9792.654925] [<ffffffff81053c7c>] warn_slowpath_null+0x1a/0x1c [ 9792.654933] [<ffffffffa038e747>] btrfs_alloc_free_block+0xca/0x27c [btrfs] [ 9792.654945] [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs] [ 9792.654953] [<ffffffffa038189b>] __btrfs_cow_block+0xfc/0x30c [btrfs] [ 9792.654963] [<ffffffffa0396aa6>] ? btrfs_buffer_uptodate+0x47/0x58 [btrfs] [ 9792.654970] [<ffffffffa0382e48>] ? read_block_for_search+0x94/0x368 [btrfs] [ 9792.654978] [<ffffffffa0381ba9>] btrfs_cow_block+0xfe/0x146 [btrfs] [ 9792.654986] [<ffffffffa03848b0>] btrfs_search_slot+0x14d/0x4b6 [btrfs] [ 9792.654997] [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs] [ 9792.655022] [<ffffffffa03938e8>] btrfs_lookup_inode+0x2f/0x8f [btrfs] [ 9792.655025] [<ffffffff8147afac>] ? _cond_resched+0xe/0x22 [ 9792.655027] [<ffffffff8147b892>] ? mutex_lock+0x29/0x50 [ 9792.655039] [<ffffffffa03d41b1>] btrfs_update_delayed_inode+0x72/0x137 [btrfs] [ 9792.655051] [<ffffffffa03d4ea2>] btrfs_run_delayed_items+0x90/0xdb [btrfs] [ 9792.655062] [<ffffffffa039a69b>] btrfs_commit_transaction+0x228/0x654 [btrfs] [ 9792.655064] [<ffffffff8106e8da>] ? remove_wait_queue+0x3a/0x3a [ 9792.655075] [<ffffffffa03a2fa5>] btrfs_evict_inode+0x14d/0x202 [btrfs] [ 9792.655077] [<ffffffff81132bd6>] evict+0x71/0x111 [ 9792.655079] [<ffffffff81132de0>] iput+0x12a/0x132 [ 9792.655081] [<ffffffff8112aa3a>] do_unlinkat+0x106/0x155 [ 9792.655083] [<ffffffff81127b83>] ? path_put+0x1f/0x23 [ 9792.655085] [<ffffffff8109c53c>] ? audit_syscall_entry+0x145/0x171 [ 9792.655087] [<ffffffff81128410>] ? putname+0x34/0x36 [ 9792.655090] [<ffffffff8112b441>] sys_unlinkat+0x29/0x2b [ 9792.655092] [<ffffffff81482c42>] system_call_fastpath+0x16/0x1b [ 9792.655093] ---[ end trace 02b696eb02b3f768 ]--- This patch fix it by setting the reservation of the transaction handle to the correct one. Reported-by: Josef Bacik <josef@redhat.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17btrfs: Remove unused sysfs codeMaarten Lankhorst
Removes code no longer used. The sysfs file itself is kept, because the btrfs developers expressed interest in putting new entries to sysfs. Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17btrfs: fix dereference of ERR_PTR valueDavid Sterba
smatch reports: btrfs_recover_log_trees error: 'wc.replay_dest' dereferencing possible ERR_PTR() Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17Merge branch 'for-chris' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus Conflicts: fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17Btrfs: fix relocation racesChris Mason
The recent commit to get rid of our trans_mutex introduced some races with block group relocation. The problem is that relocation needs to do some record keeping about each root, and it was relying on the transaction mutex to coordinate things in subtle ways. This fix adds a mutex just for the relocation code and makes sure it doesn't have a big impact on normal operations. The race is really fixed in btrfs_record_root_in_trans, which is where we step back and wait for the relocation code to finish accounting setup. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-15Btrfs: set no_trans_join after trying to expand the transactionJosef Bacik
We can lockup if we try to allow new writers join the transaction and we have flushoncommit set or have a pending snapshot. This is because we set no_trans_join and then loop around and try to wait for ordered extents again. The problem is the ordered endio stuff needs to join the transaction, which it can't do because no_trans_join is set. So instead wait until after this loop to set no_trans_join and then make sure to wait for num_writers == 1 in case anybody got started in between us exiting the loop and setting no_trans_join. This could easily be reproduced by mounting -o flushoncommit and running xfstest 13. It cannot be reproduced with this patch. Thanks, Reported-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-15Btrfs: protect the pending_snapshots list with trans_lockJosef Bacik
Currently there is nothing protecting the pending_snapshots list on the transaction. We only hold the directory mutex that we are snapshotting and a read lock on the subvol_sem, so we could race with somebody else creating a snapshot in a different directory and end up with list corruption. So protect this list with the trans_lock. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-15Btrfs: fix path leakage on subvol deletionJosef Bacik
The delayed ref patch accidently removed the btrfs_free_path in btrfs_unlink_subvol, this puts it back and means we don't leak a path. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-13Btrfs: drop the delalloc_bytes check in shrink_delallocChris Mason
Even when delalloc_bytes is zero, we may need to sleep while waiting for delalloc space. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-13Btrfs: check the return value from set_anon_superChris Mason
Al Viro noticed we weren't checking for set_anon_super failures. This adds the required checks. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: use join_transaction in btrfs_evict_inode() Btrfs - use %pU to print fsid Btrfs: fix extent state leak on failed nodatasum reads btrfs: fix unlocked access of delalloc_inodes Btrfs: avoid stack bloat in btrfs_ioctl_fs_info() btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline Btrfs: clear current->journal_info on async transaction commit Btrfs: make sure to recheck for bitmaps in clusters btrfs: remove unneeded includes from scrub.c btrfs: reinitialize scrub workers btrfs: scrub: errors in tree enumeration Btrfs: don't map extent buffer if path->skip_locking is set Btrfs: unlock the trans lock properly Btrfs: don't map extent buffer if path->skip_locking is set Btrfs: fix duplicate checking logic Btrfs: fix the allocator loop logic Btrfs: fix bitmap regression Btrfs: don't commit the transaction if we dont have enough pinned bytes Btrfs: noinline the cluster searching functions Btrfs: cache bitmaps when searching for a cluster
2011-06-11Btrfs: use join_transaction in btrfs_evict_inode()Li Zefan
The WARN_ON() in start_transaction() was triggered while balancing. The cause is btrfs_relocate_chunk() started a transaction and then called iput() on the inode that stores free space cache, and iput() called btrfs_start_transaction() again. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10Btrfs - use %pU to print fsidIlya Dryomov
Get rid of FIXME comment. Uuids from dmesg are now the same as uuids given by btrfs-progs. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10Btrfs: fix extent state leak on failed nodatasum readsJan Schmidt
When encountering an EIO while reading from a nodatasum extent, we insert an error record into the inode's failure tree. btrfs_readpage_end_io_hook returns early for nodatasum inodes. We'd better clear the failure tree in that case, otherwise the kernel complains about BUG extent_state: Objects remaining on kmem_cache_close() on rmmod. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10Merge branch 'for-chris' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into for-linus
2011-06-10btrfs: fix unlocked access of delalloc_inodesDavid Sterba
list_splice_init will make delalloc_inodes empty, but without a spinlock around, this may produce corrupted list head, accessed in many placess, The race window is very tight and nobody seems to have hit it so far. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10Btrfs: avoid stack bloat in btrfs_ioctl_fs_info()Li Zefan
The size of struct btrfs_ioctl_fs_info_args is as big as 1KB, so don't declare the variable on stack. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one ↵richard kennedy
fewer cacheline Reorder extent_buffer to remove 8 bytes of alignment padding on 64 bit builds. This shrinks its size to 128 bytes allowing it to fit into one fewer cache lines and allows more objects per slab in its kmem_cache. slabinfo extent_buffer reports :- before:- Sizes (bytes) Slabs ---------------------------------- Object : 136 Total : 123 SlabObj: 136 Full : 121 SlabSiz: 4096 Partial: 0 Loss : 0 CpuSlab: 2 Align : 8 Objects: 30 after :- Object : 128 Total : 4 SlabObj: 128 Full : 2 SlabSiz: 4096 Partial: 0 Loss : 0 CpuSlab: 2 Align : 8 Objects: 32 Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10Btrfs: clear current->journal_info on async transaction commitSage Weil
Normally current->jouranl_info is cleared by commit_transaction. For an async snap or subvol creation, though, it runs in a work queue. Clear it in btrfs_commit_transaction_async() to avoid leaking a non-NULL journal_info when we return to userspace. When the actual commit runs in the other thread it won't care that it's current->journal_info is already NULL. Signed-off-by: Sage Weil <sage@newdream.net> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10Btrfs: make sure to recheck for bitmaps in clustersChris Mason
Josef recently changed the free extent cache to look in the block group cluster for any bitmaps before trying to add a new bitmap for the same offset. This avoids BUG_ON()s due covering duplicate ranges. But it didn't go quite far enough. A given free range might span between one or more bitmaps or free space entries. The code has looping to cover this, but it doesn't check for clustered bitmaps every time. This shuffles our gotos to check for a bitmap in the cluster for every new bitmap entry we try to add. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-10btrfs: remove unneeded includes from scrub.cArne Jansen
Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-06-10btrfs: reinitialize scrub workersArne Jansen
Scrub starts the workers each time a scrub starts and stops them after it finished. This patch adds an initialization for the workers before each start, otherwise the workers behave strangely. Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-06-10btrfs: scrub: errors in tree enumerationArne Jansen
due to the semantics of btrfs_search_slot the path can point to an invalid slot when ret > 0. This condition went unnoticed, which in turn could have led to an incomplete scrubbing. Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-06-10Btrfs: don't map extent buffer if path->skip_locking is setJosef Bacik
Arne's scrub stuff exposed a problem with mapping the extent buffer in reada_for_search. He searches the commit root with multiple threads and with skip_locking set, so we can race and overwrite node->map_token since node isn't locked. So fix this so that we only map the extent buffer if we don't already have a map_token and skip_locking isn't set. Without this patch scrub would panic almost immediately, with the patch it doesn't panic anymore. Thanks, Reported-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-09Btrfs: unlock the trans lock properlyJosef Bacik
In btrfs_wait_for_commit if we came upon a transaction that had committed we just exited, but that's bad since we are holding the trans_lock. So break instead so that the lock is dropped. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-09Btrfs: don't map extent buffer if path->skip_locking is setJosef Bacik
Arne's scrub stuff exposed a problem with mapping the extent buffer in reada_for_search. He searches the commit root with multiple threads and with skip_locking set, so we can race and overwrite node->map_token since node isn't locked. So fix this so that we only map the extent buffer if we don't already have a map_token and skip_locking isn't set. Without this patch scrub would panic almost immediately, with the patch it doesn't panic anymore. Thanks, Reported-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: fix duplicate checking logicJosef Bacik
When merging my code into the integration test the second check for duplicate entries got screwed up. This patch fixes it by dropping ret2 and just using ret for the return value, and checking if we got an error before adding the bitmap to the local list. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: fix the allocator loop logicJosef Bacik
I was testing with empty_cluster = 0 to try and reproduce a problem and kept hitting early enospc panics. This was because our loop logic was a little confused. So this is what I did 1) Make the loop variable the ultimate decider on wether we should loop again isntead of checking to see if we had an uncached bg, empty size or empty cluster. 2) Increment loop before checking to see what we are on to make the loop definitions make more sense. 3) If we are on the chunk alloc loop don't set empty_size/empty_cluster to 0 unless we didn't actually allocate a chunk. If we did allocate a chunk we should be able to easily setup a new cluster so clearing empty_size/empty_cluster makes us less efficient. This kept me from hitting panics while trying to reproduce the other problem. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: fix bitmap regressionJosef Bacik
In cleaning up the clustering code I accidently introduced a regression by adding bitmap entries to the cluster rb tree. The problem is if we've maxed out the number of bitmaps we can have for the block group we can only add free space to the bitmaps, but since the bitmap is on the cluster we can't find it and we try to create another one. This would result in a panic because the total bitmaps was bigger than the max bitmaps that were allowed. This patch fixes this by checking to see if we have a cluster, and then looking at the cluster rb tree to see if it has a bitmap entry and if it does and that space belongs to that bitmap, go ahead and add it to that bitmap. I could hit this panic every time with an fs_mark test within a couple of minutes. With this patch I no longer hit the panic and fs_mark goes to completion. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: don't commit the transaction if we dont have enough pinned bytesJosef Bacik
I noticed when running an enospc test that we would get stuck committing the transaction in check_data_space even though we truly didn't have enough space. So check to see if bytes_pinned is bigger than num_bytes, if it's not don't commit the transaction. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: noinline the cluster searching functionsJosef Bacik
When profiling the find cluster code it's hard to tell where we are spending our time because the bitmap and non-bitmap functions get inlined by the compiler, so make that not happen. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-08Btrfs: cache bitmaps when searching for a clusterJosef Bacik
If we are looking for a cluster in a particularly sparse or fragmented block group, we will do a lot of looping through the free space tree looking for various things, and if we need to look at bitmaps we will endup doing the whole dance twice. So instead add the bitmap entries to a temporary list so if we have to do the bitmap search we can just look through the list of entries we've found quickly instead of having to loop through the entire tree again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-06-07Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: vfs: make unlink() and rmdir() return ENOENT in preference to EROFS lmLogOpen() broken failure exit usb: remove bad dput after dentry_unhash more conservative S_NOSEC handling
2011-06-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) btrfs: fix uninitialized variable warning btrfs: add helper for fs_info->closing Btrfs: add mount -o inode_cache btrfs: scrub: add explicit plugging btrfs: use btrfs_ino to access inode number Btrfs: don't save the inode cache if we are deleting this root btrfs: false BUG_ON when degraded Btrfs: don't save the inode cache in non-FS roots Btrfs: make sure we don't overflow the free space cache crc page Btrfs: fix uninit variable in the delayed inode code btrfs: scrub: don't reuse bios and pages Btrfs: leave spinning on lookup and map the leaf Btrfs: check for duplicate entries in the free space cache Btrfs: don't try to allocate from a block group that doesn't have enough space Btrfs: don't always do readahead Btrfs: try not to sleep as much when doing slow caching Btrfs: kill BTRFS_I(inode)->block_group Btrfs: don't look at the extent buffer level 3 times in a row Btrfs: map the node block when looking for readahead targets Btrfs: set range_start to the right start in count_range_bits ...
2011-06-04btrfs: fix uninitialized variable warningDavid Sterba
With Linus' tree, today's linux-next build (powercp ppc64_defconfig) produced this warning: fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode': fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used uninitialized in this function Introduced by commit 16cdcec736cd ("btrfs: implement delayed inode items operation"). This fixes a bug in btrfs_update_inode(): if the returned value from btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not updated and several call paths may hit a BUG_ON or fail with strange code. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David Sterba <dsterba@suse.cz>