summaryrefslogtreecommitdiff
path: root/fs/gfs2
AgeCommit message (Collapse)Author
2010-07-28GFS2: Use kmalloc when possible for ->readdir()Steven Whitehouse
If we don't need a huge amount of memory in ->readdir() then we can use kmalloc rather than vmalloc to allocate it. This should cut down on the greater overheads associated with vmalloc for smaller directories. We may be able to eliminate vmalloc entirely at some stage, but this is easy to do right away. Also using GFP_NOFS to avoid any issues wrt to deleting inodes while under a glock, and suggestion from Linus to factor out the alloc/dealloc. I've given this a test with a variety of different sized directories and it seems to work ok. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-19mm: add context argument to shrinker callbackDave Chinner
The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-15GFS2: rename causes kernel OopsBob Peterson
This patch fixes a kernel Oops in the GFS2 rename code. The problem was in the way the gfs2 directory code was trying to re-use sentinel directory entries. In the failing case, gfs2's rename function was renaming a file to another name that had the same non-trivial length. The file being renamed happened to be the first directory entry on the leaf block. First, the rename code (gfs2_rename in ops_inode.c) found the original directory entry and decided it could do its job by simply replacing the directory entry with another. Therefore it determined correctly that no block allocations were needed. Next, the rename code deleted the old directory entry prior to replacing it with the new name. Therefore, the soon-to-be replaced directory entry was temporarily made into a directory entry "sentinel" or a place holder at the start of a leaf block. Lastly, it went to re-add the replacement directory entry in that leaf block. However, when gfs2_dirent_find_space was looking for space in the leaf block, it used the wrong value for the sentinel. That threw off its calculations so later it decides it can't really re-use the sentinel and therefore must allocate a new leaf block. But because it previously decided to re-use the directory entry, it didn't waste the time to grab a new block allocation for the inode. Therefore, the inode's i_alloc pointer was still NULL and it crashes trying to reference it. In the case of sentinel directory entries, the entire dirent is reused, not just the "free space" portion of it, and therefore the function gfs2_dirent_find_space should use the value 0 rather than GFS2_DIRENT_SIZE(0) for the actual dirent size. Fixing this calculation enables the reproducer programs to work properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-07-15GFS2: BUG in gfs2_adjust_quotaAbhijith Das
HighMem pages on i686 do not get mapped to the buffer_heads and this was causing a NULL pointer dereference when we were trying to memset page buffers to zero. We now use zero_user() that kmaps the page and directly manipulates page data. This patch also fixes a boundary condition that was incorrect. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-07-15GFS2: Fix kernel NULL pointer dereference by dlm_astdBob Peterson
This patch fixes a problem in an error path when looking up dinodes. There are two sister-functions, gfs2_inode_lookup and gfs2_process_unlinked_inode. Both functions acquire and hold the i_iopen glock for the dinode being looked up. The last thing they try to do is hold the i_gl glock for the dinode. If that glock fails for some reason, the error path was incorrectly calling gfs2_glock_put for the i_iopen glock twice. This resulted in the glock being prematurely freed. The "minimum hold time" usually kept the glock in memory, but the lock interface to dlm (aka lock_dlm) freed its memory for the glock. In some circumstances, it would cause dlm's dlm_astd daemon to try to call the bast function for the freed lock_dlm memory, which resulted in a NULL pointer dereference. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-07-15GFS2: recovery stuck on transaction lockBob Peterson
This patch fixes bugzilla bug #590878: GFS2: recovery stuck on transaction lock. We set the frozen flag on the glock when we receive a completion that cannot be delivered due to blocked locks. At that point we check to see whether the first waiting holder has the noexp flag set. If the noexp lock is queued later, then we need to unfreeze the glock at that point in time, namely, in the glock work function. This patch was originally written by Steve Whitehouse, but since he's on holiday, I'm submitting it. It's been well tested with a complex recovery test called revolver. Signed-off-by: Steve Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2010-07-15GFS2: O_TRUNC not working on stuffed files across clusterBob Peterson
This patch replaces a statement that got dropped out by accident. Without the patch, truncates on stuffed (very small) files cause those files to have an unpredictable size. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-27kill spurious reference to vmtruncatenpiggin@suse.de
Lots of filesystems calls vmtruncate despite not implementing the old ->truncate method. Switch them to use simple_setsize and add some comments about the truncate code where it seems fitting. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27drop unused dentry argument to ->fsyncChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Fix permissions checking for setflags ioctl() GFS2: Don't "get" xattrs for ACLs when ACLs are turned off GFS2: Rework reclaiming unlinked dinodes
2010-05-24GFS2: Fix permissions checking for setflags ioctl()Steven Whitehouse
We should be checking for the ownership of the file for which flags are being set, rather than just for write access. Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits) fix handling of offsets in cris eeprom.c, get rid of fake on-stack files get rid of home-grown mutex in cris eeprom.c switch ecryptfs_write() to struct inode *, kill on-stack fake files switch ecryptfs_get_locked_page() to struct inode * simplify access to ecryptfs inodes in ->readpage() and friends AFS: Don't put struct file on the stack Ban ecryptfs over ecryptfs logfs: replace inode uid,gid,mode initialization with helper function ufs: replace inode uid,gid,mode initialization with helper function udf: replace inode uid,gid,mode init with helper ubifs: replace inode uid,gid,mode initialization with helper function sysv: replace inode uid,gid,mode initialization with helper function reiserfs: replace inode uid,gid,mode initialization with helper function ramfs: replace inode uid,gid,mode initialization with helper function omfs: replace inode uid,gid,mode initialization with helper function bfs: replace inode uid,gid,mode initialization with helper function ocfs2: replace inode uid,gid,mode initialization with helper function nilfs2: replace inode uid,gid,mode initialization with helper function minix: replace inode uid,gid,mode init with helper ext4: replace inode uid,gid,mode init with helper ... Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)
2010-05-21gfs: constify xattr_handlerStephen Hemminger
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21Merge branch 'master' into for-2.6.35Jens Axboe
Conflicts: fs/ext3/fsync.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21quota: unify ->set_dqblkChristoph Hellwig
Pass the larger struct fs_disk_quota to the ->set_dqblk operation so that the Q_SETQUOTA and Q_XSETQUOTA operations can be implemented with a single filesystem operation and we can retire the ->set_xquota operation. The additional information (RT-subvolume accounting and warn counts) are left zero for the VFS quota implementation. Add new fieldmask values for setting the numer of blocks and inodes values which is required for the VFS quota, but wasn't for XFS. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21quota: unify ->get_dqblkChristoph Hellwig
Pass the larger struct fs_disk_quota to the ->get_dqblk operation so that the Q_GETQUOTA and Q_XGETQUOTA operations can be implemented with a single filesystem operation and we can retire the ->get_xquota operation. The additional information (RT-subvolume accounting and warn counts) are left zero for the VFS quota implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21GFS2: Don't "get" xattrs for ACLs when ACLs are turned offSteven Whitehouse
This is to match ext3 behaviour. We should not allow getting of xattrs relating to ACLs when ACLs are turned off. Reported-by: Nate Straz <nstraz@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-21GFS2: Rework reclaiming unlinked dinodesBob Peterson
The previous patch I wrote for reclaiming unlinked dinodes had some shortcomings and did not prevent all hangs. This version is much cleaner and more logical, and has passed very difficult testing. Sorry for the churn. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Fix typo GFS2: stuck in inode wait, no glocks stuck GFS2: Eliminate useless err variable GFS2: Fix writing to non-page aligned gfs2_quota structures GFS2: Add some useful messages GFS2: fix quota state reporting GFS2: Various gfs2_logd improvements GFS2: glock livelock GFS2: Clean up stuffed file copying GFS2: docs update GFS2: Remove space from slab cache name
2010-05-14GFS2: Fix typoSteven Whitehouse
A missing ! in a test. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-12GFS2: stuck in inode wait, no glocks stuckBob Peterson
This patch changes the lock ordering when gfs2 reclaims unlinked dinodes, thereby avoiding a livelock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-12GFS2: Eliminate useless err variableBob Peterson
This patch removes an unneeded "err" variable that is always returned as zero. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-10GFS2: Fix writing to non-page aligned gfs2_quota structuresAbhijith Das
This is the upstream fix for this bug. This patch differs from the RHEL5 fix (Red Hat bz #555754) which simply writes to the 8-byte value field of the quota. In upstream quota code, we're required to write the entire quota (88 bytes) which can be split across a page boundary. We check for such quotas, and read/write the two parts from/to the corresponding pages holding these parts. With this patch, I don't see the bug anymore using the reproducer in Red Hat bz 555754. I successfully ran a couple of simple tests/mounts/ umounts and it doesn't seem like this patch breaks anything else. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-06GFS2: Add some useful messagesSteven Whitehouse
The following patch adds a message to indicate when barriers have been disabled due to a block device which doesn't support them. You could already tell this via the mount options in /proc/mounts, but all the other filesystems also log a message at the same time. Also, the same mechanisms are used to indicate when the lock demote interface has been used (only ever used for debugging) which is a request from our support team. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-05GFS2: fix quota state reportingChristoph Hellwig
We need to report both the accounting and enforcing flags if we are in enforcing mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-05GFS2: Various gfs2_logd improvementsBenjamin Marzinski
This patch contains various tweaks to how log flushes and active item writeback work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits for gfs2_logd to do the log flushing. Multiple functions were rewritten to remove the need to call gfs2_log_lock(). Instead of using one test to see if gfs2_logd had work to do, there are now seperate tests to check if there are two many buffers in the incore log or if there are two many items on the active items list. This patch is a port of a patch Steve Whitehouse wrote about a year ago, with some minor changes. Since gfs2_ail1_start always submits all the active items, it no longer needs to keep track of the first ai submitted, so this has been removed. In gfs2_log_reserve(), the order of the calls to prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has been switched. If it called wake_up first there was a small window for a race, where logd could run and return before gfs2_log_reserve was ready to get woken up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve() would be left waiting for gfs2_logd to eventualy run because it timed out. Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times out, and flushes the log, can now be set on mount with ar_commit. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-04-28blkdev: generalize flags for blkdev_issue_fn functionsDmitry Monakhov
The patch just convert all blkdev_issue_xxx function to common set of flags. Wait/allocation semantics preserved. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-14GFS2: glock livelockBob Peterson
This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-29GFS2: Clean up stuffed file copyingSteven Whitehouse
If the inode size was corrupt for stuffed files, it was possible for the copying of data to overrun the block and/or page. This patch checks for that condition so that this is no longer possible. This is also preparation for the new truncate sequence patch which requires the ability to have stuffed files with larger sizes than (disk block size - sizeof(on disk inode)) with the restriction that only the initial part of the file may be non-zero. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-29GFS2: Remove space from slab cache nameSteven Whitehouse
Apparently this might confuse parsers. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Skip check for mandatory locks when unlocking GFS2: Allow the number of committed revokes to temporarily be negative GFS2: do not select QUOTA
2010-03-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits) doc: fix typo in comment explaining rb_tree usage Remove fs/ntfs/ChangeLog doc: fix console doc typo doc: cpuset: Update the cpuset flag file Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed Remove drivers/parport/ChangeLog Remove drivers/char/ChangeLog doc: typo - Table 1-2 should refer to "status", not "statm" tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h devres/irq: Fix devm_irq_match comment Remove reference to kthread_create_on_cpu tree-wide: Assorted spelling fixes tree-wide: fix 'lenght' typo in comments and code drm/kms: fix spelling in error message doc: capitalization and other minor fixes in pnp doc devres: typo fix s/dev/devm/ Remove redundant trailing semicolons from macros fix typo "definetly" -> "definitely" in comment tree-wide: s/widht/width/g typo in comments ... Fix trivial conflict in Documentation/laptops/00-INDEX
2010-03-11GFS2: Skip check for mandatory locks when unlockingSachin Prabhu
gfs2_lock() will skip locks on file which have mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. Such a lock will be skipped and will result in a BUG in locks_remove_flock(). gfs2_lock() should skip the check for mandatory locks when unlocking a file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-11GFS2: Allow the number of committed revokes to temporarily be negativeBenjamin Marzinski
GFS2 tracks the number of revokes and unrevokes that are part of committed transactions via sd_log_commited_revoke. It is possible for one process to add revokes during its transaction, while another process unrevokes them during its transaction. If the second process finishes its transaction first, sd_log_commited_revoke will be decremented by the number of unrevokes that the second process did, without first being incremented by the number of revokes the first process did. This is fine, since all started transactions must be completed before the journal can be flushed. However, sd_log_commited_revoke is an unsigned integer, and log_refund() causes an assertion failure if it would go negative at the end of a transaction. This patch makes sd_log_commited_revoke a signed integer and allows it to go negative. __gfs2_log_flush() still checks that it mataches the actual number of revokes. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-09GFS2: do not select QUOTAChristoph Hellwig
gfs2 only needs the quotactl code, not the generic quota implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-08Merge branch 'for-next' into for-linusJiri Kosina
Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c
2010-03-07Driver core: Constify struct sysfs_ops in struct kobj_typeEmese Revfy
Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07kobject: Constify struct kset_uevent_opsEmese Revfy
Constify struct kset_uevent_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-05Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
2010-03-05pass writeback_control to ->write_inodeChristoph Hellwig
This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05quota: move code from sync_quota_sb into vfs_quota_syncChristoph Hellwig
Currenly sync_quota_sb does a lot of sync and truncate action that only applies to "VFS" style quotas and is actively harmful for the sync performance in XFS. Move it into vfs_quota_sync and add a wait parameter to ->quota_sync to tell if we need it or not. My audit of the GFS2 code says it's also not needed given the way GFS2 implements quotas, but I'd be happy if this can get a detailed review. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) init: Open /dev/console from rootfs mqueue: fix typo "failues" -> "failures" mqueue: only set error codes if they are really necessary mqueue: simplify do_open() error handling mqueue: apply mathematics distributivity on mq_bytes calculation mqueue: remove unneeded info->messages initialization mqueue: fix mq_open() file descriptor leak on user-space processes fix race in d_splice_alias() set S_DEAD on unlink() and non-directory rename() victims vfs: add NOFOLLOW flag to umount(2) get rid of ->mnt_parent in tomoyo/realpath hppfs can use existing proc_mnt, no need for do_kern_mount() in there Mirror MS_KERNMOUNT in ->mnt_flags get rid of useless vfsmount_lock use in put_mnt_ns() Take vfsmount_lock to fs/internal.h get rid of insanity with namespace roots in tomoyo take check for new events in namespace (guts of mounts_poll()) to namespace.c Don't mess with generic_permission() under ->d_lock in hpfs sanitize const/signedness for udf nilfs: sanitize const/signedness in dealing with ->d_name.name ... Fix up fairly trivial (famous last words...) conflicts in drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-03Switch gfs2 to nd_set_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-01GFS2: print glock numbers in hexBob Peterson
This patch changes glock numbers from printing in decimal to hex. Since DLM prints corresponding resource IDs in hex, it makes debugging easier. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-01GFS2: ordered writes are backwardsDave Chinner
When we queue data buffers for ordered write, the buffers are added to the head of the ordered write list. When the log needs to push these buffers to disk, it also walks the list from the head. The result is that the the ordered buffers are submitted to disk in reverse order. For large writes, this means that whenever the log flushes large streams of reverse sequential order buffers are pushed down into the block layers. The elevators don't handle this particularly well, so IO rates tend to be significantly lower than if the IO was issued in ascending block order. Queue new ordered buffers to the tail of the ordered buffer list to ensure that IO is dispatched in the order it was submitted. This should significantly improve large sequential write speeds. On a disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for noop and from 38MB/s to 50MB/s for cfq. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-01GFS2: Remove loopy umount codeSteven Whitehouse
As a consequence of the previous patch, we can now remove the loop which used to be required due to the circular dependency between the inodes and glocks. Instead we can just invalidate the inodes, and then clear up any glocks which are left. Also we no longer need the rwsem since there is no longer any danger of the inode invalidation calling back into the glock code (and from there back into the inode code). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-01GFS2: Metadata address space clean upSteven Whitehouse
Since the start of GFS2, an "extra" inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between "normal" and "metadata" inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-12GFS2: Fix bmap allocation corner-case bugSteven Whitehouse
This patch solves a corner case during allocation which occurs if both metadata (indirect) and data blocks are required but there is an obstacle in the filesystem (e.g. a resource group header or another allocated block) such that when the allocation is requested only enough blocks for the metadata are returned. By changing the exit condition of this loop, we ensure that a minimum of one data block will always be returned. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-12GFS2: Fix error codeAbhijith Das
We need this one-liner to signal the mount helper of the 'insufficient journals' condition. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>