summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_alloc_btree.c
AgeCommit message (Collapse)Author
2010-05-24xfs: Improve scalability of busy extent trackingDave Chinner
When we free a metadata extent, we record it in the per-AG busy extent array so that it is not re-used before the freeing transaction hits the disk. This array is fixed size, so when it overflows we make further allocation transactions synchronous because we cannot track more freed extents until those transactions hit the disk and are completed. Under heavy mixed allocation and freeing workloads with large log buffers, we can overflow this array quite easily. Further, the array is sparsely populated, which means that inserts need to search for a free slot, and array searches often have to search many more slots that are actually used to check all the busy extents. Quite inefficient, really. To enable this aspect of extent freeing to scale better, we need a structure that can grow dynamically. While in other areas of XFS we have used radix trees, the extents being freed are at random locations on disk so are better suited to being indexed by an rbtree. So, use a per-AG rbtree indexed by block number to track busy extents. This incures a memory allocation when marking an extent busy, but should not occur too often in low memory situations. This should scale to an arbitrary number of extents so should not be a limitation for features such as in-memory aggregation of transactions. However, there are still situations where we can't avoid allocating busy extents (such as allocation from the AGFL). To minimise the overhead of such occurences, we need to avoid doing a synchronous log force while holding the AGF locked to ensure that the previous transactions are safely on disk before we use the extent. We can do this by marking the transaction doing the allocation as synchronous rather issuing a log force. Because of the locking involved and the ordering of transactions, the synchronous transaction provides the same guarantees as a synchronous log force because it ensures that all the prior transactions are already on disk when the synchronous transaction hits the disk. i.e. it preserves the free->allocate order of the extent correctly in recovery. By doing this, we avoid holding the AGF locked while log writes are in progress, hence reducing the length of time the lock is held and therefore we increase the rate at which we can allocate and free from the allocation group, thereby increasing overall throughput. The only problem with this approach is that when a metadata buffer is marked stale (e.g. a directory block is removed), then buffer remains pinned and locked until the log goes to disk. The issue here is that if that stale buffer is reallocated in a subsequent transaction, the attempt to lock that buffer in the transaction will hang waiting the log to go to disk to unlock and unpin the buffer. Hence if someone tries to lock a pinned, stale, locked buffer we need to push on the log to get it unlocked ASAP. Effectively we are trading off a guaranteed log force for a much less common trigger for log force to occur. Ideally we should not reallocate busy extents. That is a much more complex fix to the problem as it involves direct intervention in the allocation btree searches in many places. This is left to a future set of modifications. Finally, now that we track busy extents in allocated memory, we don't need the descriptors in the transaction structure to point to them. We can replace the complex busy chunk infrastructure with a simple linked list of busy extents. This allows us to remove a large chunk of code, making the overall change a net reduction in code size. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-01-15xfs: Don't directly reference m_perag in allocation codeDave Chinner
Start abstracting the perag references so that the indexing of the structures is not directly coded into all the places that uses the perag structures. This will allow us to separate the use of the perag structure and the way it is indexed and hence avoid the known deadlocks related to growing a busy filesystem. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-14xfs: event tracing supportChristoph Hellwig
Convert the old xfs tracing support that could only be used with the out of tree kdb and xfsidbg patches to use the generic event tracer. To use it make sure CONFIG_EVENT_TRACING is enabled and then enable all xfs trace channels by: echo 1 > /sys/kernel/debug/tracing/events/xfs/enable or alternatively enable single events by just doing the same in one event subdirectory, e.g. echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable or set more complex filters, etc. In Documentation/trace/events.txt all this is desctribed in more detail. To reads the events do a cat /sys/kernel/debug/tracing/trace Compared to the last posting this patch converts the tracing mostly to the one tracepoint per callsite model that other users of the new tracing facility also employ. This allows a very fine-grained control of the tracing, a cleaner output of the traces and also enables the perf tool to use each tracepoint as a virtual performance counter, allowing us to e.g. count how often certain workloads git various spots in XFS. Take a look at http://lwn.net/Articles/346470/ for some examples. Also the btree tracing isn't included at all yet, as it will require additional core tracing features not in mainline yet, I plan to deliver it later. And the really nice thing about this patch is that it actually removes many lines of code while adding this nice functionality: fs/xfs/Makefile | 8 fs/xfs/linux-2.6/xfs_acl.c | 1 fs/xfs/linux-2.6/xfs_aops.c | 52 - fs/xfs/linux-2.6/xfs_aops.h | 2 fs/xfs/linux-2.6/xfs_buf.c | 117 +-- fs/xfs/linux-2.6/xfs_buf.h | 33 fs/xfs/linux-2.6/xfs_fs_subr.c | 3 fs/xfs/linux-2.6/xfs_ioctl.c | 1 fs/xfs/linux-2.6/xfs_ioctl32.c | 1 fs/xfs/linux-2.6/xfs_iops.c | 1 fs/xfs/linux-2.6/xfs_linux.h | 1 fs/xfs/linux-2.6/xfs_lrw.c | 87 -- fs/xfs/linux-2.6/xfs_lrw.h | 45 - fs/xfs/linux-2.6/xfs_super.c | 104 --- fs/xfs/linux-2.6/xfs_super.h | 7 fs/xfs/linux-2.6/xfs_sync.c | 1 fs/xfs/linux-2.6/xfs_trace.c | 75 ++ fs/xfs/linux-2.6/xfs_trace.h | 1369 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/linux-2.6/xfs_vnode.h | 4 fs/xfs/quota/xfs_dquot.c | 110 --- fs/xfs/quota/xfs_dquot.h | 21 fs/xfs/quota/xfs_qm.c | 40 - fs/xfs/quota/xfs_qm_syscalls.c | 4 fs/xfs/support/ktrace.c | 323 --------- fs/xfs/support/ktrace.h | 85 -- fs/xfs/xfs.h | 16 fs/xfs/xfs_ag.h | 14 fs/xfs/xfs_alloc.c | 230 +----- fs/xfs/xfs_alloc.h | 27 fs/xfs/xfs_alloc_btree.c | 1 fs/xfs/xfs_attr.c | 107 --- fs/xfs/xfs_attr.h | 10 fs/xfs/xfs_attr_leaf.c | 14 fs/xfs/xfs_attr_sf.h | 40 - fs/xfs/xfs_bmap.c | 507 +++------------ fs/xfs/xfs_bmap.h | 49 - fs/xfs/xfs_bmap_btree.c | 6 fs/xfs/xfs_btree.c | 5 fs/xfs/xfs_btree_trace.h | 17 fs/xfs/xfs_buf_item.c | 87 -- fs/xfs/xfs_buf_item.h | 20 fs/xfs/xfs_da_btree.c | 3 fs/xfs/xfs_da_btree.h | 7 fs/xfs/xfs_dfrag.c | 2 fs/xfs/xfs_dir2.c | 8 fs/xfs/xfs_dir2_block.c | 20 fs/xfs/xfs_dir2_leaf.c | 21 fs/xfs/xfs_dir2_node.c | 27 fs/xfs/xfs_dir2_sf.c | 26 fs/xfs/xfs_dir2_trace.c | 216 ------ fs/xfs/xfs_dir2_trace.h | 72 -- fs/xfs/xfs_filestream.c | 8 fs/xfs/xfs_fsops.c | 2 fs/xfs/xfs_iget.c | 111 --- fs/xfs/xfs_inode.c | 67 -- fs/xfs/xfs_inode.h | 76 -- fs/xfs/xfs_inode_item.c | 5 fs/xfs/xfs_iomap.c | 85 -- fs/xfs/xfs_iomap.h | 8 fs/xfs/xfs_log.c | 181 +---- fs/xfs/xfs_log_priv.h | 20 fs/xfs/xfs_log_recover.c | 1 fs/xfs/xfs_mount.c | 2 fs/xfs/xfs_quota.h | 8 fs/xfs/xfs_rename.c | 1 fs/xfs/xfs_rtalloc.c | 1 fs/xfs/xfs_rw.c | 3 fs/xfs/xfs_trans.h | 47 + fs/xfs/xfs_trans_buf.c | 62 - fs/xfs/xfs_vnodeops.c | 8 70 files changed, 2151 insertions(+), 2592 deletions(-) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-01-19[XFS] Remove the rest of the macro-to-function indirections.Eric Sandeen
Remove the last of the macros-defined-to-static-functions. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30[XFS] Always use struct xfs_btree_block instead of short / longformChristoph Hellwig
structures. Always use the generic xfs_btree_block type instead of the short / long structures. Add XFS_BTREE_SBLOCK_LEN / XFS_BTREE_LBLOCK_LEN defines for the length of a short / long form block. The rationale for this is that we will grow more btree block header variants to support CRCs and other RAS information, and always accessing them through the same datatype with unions for the short / long form pointers makes implementing this much easier. SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32300a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30[XFS] cleanup btree record / key / ptr addressing macros.Christoph Hellwig
Replace the generic record / key / ptr addressing macros that use cpp token pasting with simpler macros that do the job for just one given btree type. The new macros lose the cur argument and thus can be used outside the core btree code, but also gain an xfs_mount * argument to allow for checking the CRC flag in the near future. Note that many of these macros aren't actually used in the kernel code, but only in userspace (mostly in xfs_repair). SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32295a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30[XFS] Cleanup maxrecs calculation.Christoph Hellwig
Clean up the way the maximum and minimum records for the btree blocks are calculated. For the alloc and inobt btrees all the values are pre-calculated in xfs_mount_common, and we switch the current loop around the ugly generic macros that use cpp token pasting to generate type names to two small helpers in normal C code. For the bmbt and bmdr trees these helpers also exist, but can be called during runtime, too. Here we also kill various macros dealing with them and inline the logic into the get_minrecs / get_maxrecs / get_dmaxrecs methods in xfs_bmap_btree.c. Note that all these new helpers take an xfs_mount * argument which will be needed to determine the size of a btree block once we add support for extended btree blocks with CRCs and other RAS information. SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32292a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30[XFS] add keys_inorder and recs_inorder btree methodsChristoph Hellwig
Add methods to check whether two keys/records are in the righ order. This replaces the xfs_btree_check_key and xfs_btree_check_rec methods. For the callers from xfs_bmap.c just opencode the bmbt-specific asserts. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32208a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_get_recChristoph Hellwig
Not really much reason to make it generic given that it's so small, but this is the last non-method in xfs_alloc_btree.c and xfs_ialloc_btree.c, so it makes the whole btree implementation more structured. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32206a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_delete/delrecChristoph Hellwig
Make the btree delete code generic. Based on a patch from David Chinner with lots of changes to follow the original btree implementations more closely. While this loses some of the generic helper routines for inserting/moving/removing records it also solves some of the one off bugs in the original code and makes it easier to verify. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32205a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] move xfs_bmbt_killroot to common codeChristoph Hellwig
xfs_bmbt_killroot is a mostly generic implementation of moving from a real block based root to an inode based root. So move it to xfs_btree.c where it can use all the nice infrastructure there and make it pointer size agnostic The new name for it is xfs_btree_kill_iroot, following the old naming but making it clear we're dealing with the root in inode case here, and to avoid confusion with xfs_btree_new_root which is used for the not inode rooted case. I've also added a comment describing what it does and why it's named the way it is. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32203a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_insert/insrecChristoph Hellwig
Make the btree insert code generic. Based on a patch from David Chinner with lots of changes to follow the original btree implementations more closely. While this loses some of the generic helper routines for inserting/moving/removing records it also solves some of the one off bugs in the original code and makes it easier to verify. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32202a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement semi-generic xfs_btree_new_rootChristoph Hellwig
From: Dave Chinner <dgc@sgi.com> Add a xfs_btree_new_root helper for the alloc and ialloc btrees. The bmap btree needs it's own version and is not converted. [hch: split out from bigger patch and minor adaptions] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32200a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_splitChristoph Hellwig
Make the btree split code generic. Based on a patch from David Chinner with lots of changes to follow the original btree implementations more closely. While this loses some of the generic helper routines for inserting/moving/removing records it also solves some of the one off bugs in the original code and makes it easier to verify. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32198a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_lshiftChristoph Hellwig
Make the btree left shift code generic. Based on a patch from David Chinner with lots of changes to follow the original btree implementations more closely. While this loses some of the generic helper routines for inserting/moving/removing records it also solves some of the one off bugs in the original code and makes it easier to verify. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32197a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_rshiftChristoph Hellwig
Make the btree right shift code generic. Based on a patch from David Chinner with lots of changes to follow the original btree implementations more closely. While this loses some of the generic helper routines for inserting/moving/removing records it also solves some of the one off bugs in the original code and makes it easier to verify. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32196a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_updateChristoph Hellwig
From: Dave Chinner <dgc@sgi.com> The most complicated part here is the lastrec tracking for the alloc btree. Most logic is in the update_lastrec method which has to do some hopefully good enough dirty magic to maintain it. [hch: split out from bigger patch and a rework of the lastrec logic] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32194a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_updkeyChristoph Hellwig
From: Dave Chinner <dgc@sgi.com> Note that there are many > 80 char lines introduced due to the xfs_btree_key casts. But the places where this happens is throw-away code once the whole btree code gets merged into a common implementation. The same is true for the temporary xfs_alloc_log_keys define to the new name. All old users will be gone after a few patches. [hch: split out from bigger patch and minor adaptions] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32193a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_lookupChristoph Hellwig
From: Dave Chinner <dgc@sgi.com> [hch: split out from bigger patch and minor adaptions] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32192a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_decrementChristoph Hellwig
From: Dave Chinner <dgc@sgi.com> [hch: split out from bigger patch and minor adaptions] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32191a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] implement generic xfs_btree_incrementChristoph Hellwig
From: Dave Chinner <dgc@sgi.com> Because this is the first major generic btree routine this patch includes some infrastrucure, first a few routines to deal with a btree block that can be either in short or long form, second xfs_btree_read_buf_block, which is the new central routine to read a btree block given a cursor, and third the new xfs_btree_ptr_addr routine to calculate the address for a given btree pointer record. [hch: split out from bigger patch and minor adaptions] SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32190a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] add helpers for addressing entities inside a btree blockChristoph Hellwig
Add new helpers in xfs_btree.c to find the record, key and block pointer entries inside a btree block. To implement this genericly the ->get_maxrecs methods and two new xfs_btree_ops entries for the key and record sizes are used. Also add a big comment describing how the addressing inside a btree block works. Note that these helpers are unused until users are introduced in the next patches and this patch will thus cause some harmless compiler warnings. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32189a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] add get_maxrecs btree operationChristoph Hellwig
Factor xfs_btree_maxrecs into a per-btree operation. The get_maxrecs method is based on a patch from Dave Chinner. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32188a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] make btree tracing genericChristoph Hellwig
Make the existing bmap btree tracing generic so that it applies to all btree types. Some fragments lifted from a patch by Dave Chinner. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32187a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30[XFS] split up xfs_btree_init_cursorChristoph Hellwig
xfs_btree_init_cursor contains close to little shared code for the different btrees and will get even more non-common code in the future. Split it up into one routine per btree type. Because xfs_btree_dup_cursor needs to call the init routine for a generic btree cursor add a new btree operation vector that contains a dup_cursor method that initializes a new cursor based on an existing one. The btree operations vector is based on an idea and code from Dave Chinner and will grow more entries later during this series. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32176a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
2008-02-13xfs: convert beX_add to beX_add_cpu (new common API)Marcin Slusarz
remove beX_add functions and replace all uses with beX_add_cpu Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Dave Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-14[XFS] Lazy Superblock CountersDavid Chinner
When we have a couple of hundred transactions on the fly at once, they all typically modify the on disk superblock in some way. create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify free block counts. When these counts are modified in a transaction, they must eventually lock the superblock buffer and apply the mods. The buffer then remains locked until the transaction is committed into the incore log buffer. The result of this is that with enough transactions on the fly the incore superblock buffer becomes a bottleneck. The result of contention on the incore superblock buffer is that transaction rates fall - the more pressure that is put on the superblock buffer, the slower things go. The key to removing the contention is to not require the superblock fields in question to be locked. We do that by not marking the superblock dirty in the transaction. IOWs, we modify the incore superblock but do not modify the cached superblock buffer. In short, we do not log superblock modifications to critical fields in the superblock on every transaction. In fact we only do it just before we write the superblock to disk every sync period or just before unmount. This creates an interesting problem - if we don't log or write out the fields in every transaction, then how do the values get recovered after a crash? the answer is simple - we keep enough duplicate, logged information in other structures that we can reconstruct the correct count after log recovery has been performed. It is the AGF and AGI structures that contain the duplicate information; after recovery, we walk every AGI and AGF and sum their individual counters to get the correct value, and we do a transaction into the log to correct them. An optimisation of this is that if we have a clean unmount record, we know the value in the superblock is correct, so we can avoid the summation walk under normal conditions and so mount/recovery times do not change under normal operation. One wrinkle that was discovered during development was that the blocks used in the freespace btrees are never accounted for in the AGF counters. This was once a valid optimisation to make; when the filesystem is full, the free space btrees are empty and consume no space. Hence when it matters, the "accounting" is correct. But that means the when we do the AGF summations, we would not have a correct count and xfs_check would complain. Hence a new counter was added to track the number of blocks used by the free space btrees. This is an *on-disk format change*. As a result of this, lazy superblock counters are a mkfs option and at the moment on linux there is no way to convert an old filesystem. This is possible - xfs_db can be used to twiddle the right bits and then xfs_repair will do the format conversion for you. Similarly, you can convert backwards as well. At some point we'll add functionality to xfs_admin to do the bit twiddling easily.... SGI-PV: 964999 SGI-Modid: xfs-linux-melb:xfs-kern:28652a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-09-28[XFS] Reduce endian flipping in alloc_btree, same as was done forEric Sandeen
ialloc_btree. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26910a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-09-28[XFS] use NULL for pointer initialisation instead of zero-cast-to-ptrNathan Scott
SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26562a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-09-28[XFS] remove left over INT_ comments in *alloc*.c We can verify endianessChristoph Hellwig
handling with sparse now, no need for comments. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26557a Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2006-06-20[XFS] Remove version 1 directory code. Never functioned on Linux, justNathan Scott
pure bloat. SGI-PV: 952969 SGI-Modid: xfs-linux-melb:xfs-kern:26251a Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02[XFS] Endianess annotations for various allocator data structuresChristoph Hellwig
SGI-PV: 943272 SGI-Modid: xfs-linux:xfs-kern:201006a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02[XFS] silence gcc4 warnings. the directory ones are wrong because ofChristoph Hellwig
information gcc could not find out (that a directory always has a .. entry), the others are outright gcc bugs. SGI-PV: 943511 SGI-Modid: xfs-linux:xfs-kern:200055a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02[XFS] Update license/copyright notices to match the prefered SGINathan Scott
boilerplate. SGI-PV: 913862 SGI-Modid: xfs-linux:xfs-kern:23903a Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-02[XFS] Remove xfs_macros.c, xfs_macros.h, rework headers a whole lot.Nathan Scott
SGI-PV: 943122 SGI-Modid: xfs-linux:xfs-kern:23901a Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-04-16Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!