summaryrefslogtreecommitdiff
path: root/fs/gfs2/inode.c
AgeCommit message (Collapse)Author
2011-07-25fs: take the ACL checks to common codeChristoph Hellwig
Replace the ->check_acl method with a ->get_acl method that simply reads an ACL from disk after having a cache miss. This means we can replace the ACL checking boilerplate code with a single implementation in namei.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20simplify gfs2_lookup()Al Viro
d_splice_alias() will DTRT when given NULL or ERR_PTR Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20->permission() sanitizing: don't pass flags to ->permission()Al Viro
not used by the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20->permission() sanitizing: don't pass flags to generic_permission()Al Viro
redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of them removes that bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20kill check_acl callback of generic_permission()Al Viro
its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-13GFS2: Move all locking inside the inode creation functionSteven Whitehouse
Now that there are no longer any exceptions to the normal inode creation code path, we can move the parts of the locking code which were duplicated in mkdir/mknod/create/symlink into the inode create function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13GFS2: Clean up symlink creationSteven Whitehouse
This moves the symlink specific parts of inode creation into the function where we initialise the rest of the dinode. As a result we have one less place where we need to look up the inode's buffer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-13GFS2: Clean up mkdirSteven Whitehouse
This moves the initialisation of the directory into the inode creation functions to avoid having to duplicate the lookup of the inode's buffer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10GFS2: Rename ops_inode.c to inode.cSteven Whitehouse
This is the final part of the ops_inode.c/inode.c reordering. We are left with a single file called inode.c which now contains all the inode operations, as expected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-10GFS2: Inode.c is empty now, remove itSteven Whitehouse
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09GFS2: Move final part of inode.c into super.cSteven Whitehouse
Now inode.c is empty. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09GFS2: Move most of the remaining inode.c into ops_inode.cSteven Whitehouse
This is in preparation to remove inode.c and rename ops_inode.c to inode.c. Also most of the functions which were left in inode.c relate to the creation and lookup of inodes. I'm intending to work on consolidating some of that code, and its easier when its all in one place. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09GFS2: Move gfs2_refresh_inode() and friends into glops.cSteven Whitehouse
Eventually there will only be a single caller of this code, so lets move it where it can be made static at some future date. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09GFS2: Remove gfs2_dinode_print() functionSteven Whitehouse
This function was intended for debugging purposes, but it is not very useful. If we want to know what is on disk then all we need is a block number and gfs2_edit can give us much better information about what is there. Otherwise, if we are interested in what is stored in the in-core inode, it doesn't help us out there either. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09GFS2: When adding a new dir entry, inc link count if it is a subdirSteven Whitehouse
This adds an increment of the link count when we add a new directory entry, if that entry is itself a directory. This means that we no longer need separate code to perform this operation. Now that both adding and removing directory entries automatically update the parent directory's link count if required, that makes the code shorter and simpler than before. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-05GFS2: Double check link count under glockSteven Whitehouse
To avoid any possible races relating to the link count, we need to recheck it under the inode's glock in all cases where it matters. Also to ensure we never get any nasty surprises, this patch also ensures that once the link count has hit zero it can never be elevated by rereading in data from disk. The only place we cannot provide a proper solution is in rename in the case where we are removing a target inode and we discover that the target inode has been already unlinked on another node. The race window is very small, and we return EAGAIN in this case to indicate what has happened. The proper solution would be to move the lookup parts of rename from the vfs into library calls which the fs could call directly, but that is potentially a very big job and this fix should cover most cases for now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20GFS2: Make writeback more responsive to system conditionsSteven Whitehouse
This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20GFS2: Optimise glock lru and end of life inodesSteven Whitehouse
The GLF_LRU flag introduced in the previous patch can be used to check if a glock is on the lru list when a new holder is queued and if so remove it, without having first to get the lru_lock. The main purpose of this patch however is to optimise the glocks left over when an inode at end of life is being evicted. Previously such glocks were left with the GLF_LFLUSH flag set, so that when reclaimed, each one required a log flush. This patch resets the GLF_LFLUSH flag when there is nothing left to flush thus preventing later log flushes as glocks are reused or demoted. In order to do this, we need to keep track of the number of revokes which are outstanding, and also to clear the GLF_LFLUSH bit after a log commit when only revokes have been processed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18GFS2: filesystem hang caused by incorrect lock orderBob Peterson
This patch fixes a deadlock in GFS2 where two processes are trying to reclaim an unlinked dinode: One holds the inode glock and calls gfs2_lookup_by_inum trying to look up the inode, which it can't, due to I_FREEING. The other has set I_FREEING from vfs and is at the beginning of gfs2_delete_inode waiting for the glock, which is held by the first. The solution is to add a new non_block parameter to the gfs2_iget function that causes it to return -ENOENT if the inode is being freed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-08Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into nextJames Morris
2011-02-01fs/vfs/security: pass last path component to LSM on inode creationEric Paris
SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: Eric Paris <eparis@redhat.com>
2011-01-18GFS2: Fix error path in gfs2_lookup_by_inum()Steven Whitehouse
In the (impossible, except if there is fs corruption) error path in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh() fails, it was leaving the function by calling iput() rather than iget_failed(). This would cause future lookups of the same inode to block forever. This patch fixes the problem by moving the call to gfs2_inode_refresh() into gfs2_inode_lookup() where iget_failed() is part of the error path already. Also this cleans up some unreachable code and makes gfs2_set_iop() static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-01-07Merge branch 'vfs-scale-working' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...
2011-01-07fs: provide rcu-walk aware permission i_opsNick Piggin
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2010-11-30GFS2: Remove unreachable calls to vmtruncateSteven Whitehouse
Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-11-15GFS2: Fix inode deallocation raceSteven Whitehouse
This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-09-20GFS2: Remove i_disksizeSteven Whitehouse
With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-08-09simplify checks for I_CLEAR/I_FREEINGAl Viro
add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is equivalent to I_FREEING for almost all code looking at either; it's there to keep track of having called clear_inode() exactly once per inode lifetime, at some point after having set I_FREEING. I_CLEAR and I_FREEING never get set at the same time with the current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR instead of I_CLEAR without loss of information. As the result of such change, checks become simpler and the amount of code that needs to know about I_CLEAR shrinks a lot. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09remove inode_setattrChristoph Hellwig
Replace inode_setattr with opencoded variants of it in all callers. This moves the remaining call to vmtruncate into the filesystem methods where it can be replaced with the proper truncate sequence. In a few cases it was obvious that we would never end up calling vmtruncate so it was left out in the opencoded variant: spufs: explicitly checks for ATTR_SIZE earlier btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above In addition to that ncpfs called inode_setattr with handcrafted iattrs, which allowed to trim down the opencoded variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-07-15GFS2: Fix kernel NULL pointer dereference by dlm_astdBob Peterson
This patch fixes a problem in an error path when looking up dinodes. There are two sister-functions, gfs2_inode_lookup and gfs2_process_unlinked_inode. Both functions acquire and hold the i_iopen glock for the dinode being looked up. The last thing they try to do is hold the i_gl glock for the dinode. If that glock fails for some reason, the error path was incorrectly calling gfs2_glock_put for the i_iopen glock twice. This resulted in the glock being prematurely freed. The "minimum hold time" usually kept the glock in memory, but the lock interface to dlm (aka lock_dlm) freed its memory for the glock. In some circumstances, it would cause dlm's dlm_astd daemon to try to call the bast function for the freed lock_dlm memory, which resulted in a NULL pointer dereference. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-05-21GFS2: Rework reclaiming unlinked dinodesBob Peterson
The previous patch I wrote for reclaiming unlinked dinodes had some shortcomings and did not prevent all hangs. This version is much cleaner and more logical, and has passed very difficult testing. Sorry for the churn. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-04-14GFS2: glock livelockBob Peterson
This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2010-03-01GFS2: Metadata address space clean upSteven Whitehouse
Since the start of GFS2, an "extra" inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between "normal" and "metadata" inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-12-17kill I_LOCKChristoph Hellwig
After I_SYNC was split from I_LOCK the leftover is always used together with I_NEW and thus superflous. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16sanitize xattr handler prototypesChristoph Hellwig
Add a flags argument to struct xattr_handler and pass it to all xattr handler methods. This allows using the same methods for multiple handlers, e.g. for the ACL methods which perform exactly the same action for the access and default ACLs, just using a different underlying attribute. With a little more groundwork it'll also allow sharing the methods for the regular user/trusted/secure handlers in extN, ocfs2 and jffs2 like it's already done for xfs in this patch. Also change the inode argument to the handlers to a dentry to allow using the handlers mechnism for filesystems that require it later, e.g. cifs. [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-03GFS2: Tag all metadata with jidSteven Whitehouse
There are two spare field in the header common to all GFS2 metadata. One is just the right size to fit a journal id in it, and this patch updates the journal code so that each time a metadata block is modified, we tag it with the journal id of the node which is performing the modification. The reason for this is that it should make it much easier to debug issues which arise if we can tell which node was the last to modify a particular metadata block. Since the field is updated before the block is written into the journal, each journal should only contain metadata which is tagged with its own journal id. The one exception to this is the journal header block, which might have a different node's id in it, if that journal was recovered by another node in the cluster. Thus each journal will contain a record of which nodes recovered it, via the journal header. The other field in the metadata header could potentially be used to hold information about what kind of operation was performed, but for the time being we just zero it on each transaction so that if we use it for that in future, we'll know that the information (where it exists) is reliable. I did consider using the other field to hold the journal sequence number, however since in GFS2's journaling we write the modified data into the journal and not the original data, this gives no information as to what action caused the modification, so I think we can probably come up with a better use for those 64 bits in the future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-12-03GFS2: Clean up ACLsSteven Whitehouse
To prepare for support for caching of ACLs, this cleans up the GFS2 ACL support by pushing the xattr code back into xattr.c and changing the acl_get function into one which only returns ACLs so that we can drop the caching function into it shortly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-08-27GFS2: Remove no_formal_ino generating codeSteven Whitehouse
The inum structure used throughout GFS2 has two fields. One no_addr is the disk block number of the inode in question and is used everywhere as the inode number. The other, no_formal_ino, is used only as the generation number for NFS. Historically the no_formal_ino field was set using a complicated system of one global and one per-node file containing inode numbers in order to ensure that each no_formal_ino was unique. Also this code made no provision for what would happen when eventually the (64 bit) numbers ran out. Now I know that is pretty unlikely to happen given the large space of numbers, but it is possible nevertheless. The only guarantee required for no_formal_ino is that, for any single inode, the same number doesn't get reused too quickly. We already have a generation number which is kept in the inode and initialised from a counter in the resource group (almost no overhead, since we have to touch the resource group anyway in order to allocate an inode in the first place). Aside from ensuring that we never use the value 0 in the no_formal_ino field, we can use that counter directly. As a result of that change, we lose about 200 lines of code and also gain about 10 creates/sec on the postmark benchmark (on my test machine). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-08-26GFS2: Rename eattr.[ch] as xattr.[ch]Steven Whitehouse
Use the more conventional name for the extended attribute support code. Update all the places which care. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-08-26GFS2: Clean up of extended attribute supportSteven Whitehouse
This has been on my list for some time. We need to change the way in which we handle extended attributes to allow faster file creation times (by reducing the number of transactions required) and the extended attribute code is the main obstacle to this. In addition to that, the VFS provides a way to demultiplex the xattr calls which we ought to be using, rather than rolling our own. This patch changes the GFS2 code to use that VFS feature and as a result the code shrinks by a couple of hundred lines or so, and becomes easier to read. I'm planning on doing further clean up work in this area, but this patch is a good start. The cleaned up code also uses the more usual "xattr" shorthand, I plan to eliminate the use of "eattr" eventually and in the mean time it serves as a flag as to which bits of the code have been updated. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-08-17GFS2: Improve error handling in inode allocationSteven Whitehouse
A little while back, block allocation was given some improved error handling which meant that -EIO was returned in the case of there being a problem in the resource group data. In addition a message is printed explaning what went wrong and how to fix it. This extends that error handling so that it also covers inode allocation too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-05-22GFS2: Move gfs2_unlink_ok into ops_inode.cSteven Whitehouse
Another function which is only called from one ops_inode.c so we move it and make it static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-05-22GFS2: Move gfs2_readlinki into ops_inode.cSteven Whitehouse
Move gfs2_readlinki into ops_inode.c and make it static Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-05-22GFS2: Move gfs2_rmdiri into ops_inode.cSteven Whitehouse
Move gfs2_rmdiri() into ops_inode.c and make it static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-05-22GFS2: Clean up some file namesSteven Whitehouse
This patch renames the ops_*.c files which have no counterpart without the ops_ prefix in order to shorten the name and make it more readable. In addition, ops_address.h (which was very small) is moved into inode.h and inode.h is cleaned up by adding extern where required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-04-15GFS2: cleanup file_operations messChristoph Hellwig
Remove the weird pointer to file_operations mess and replace it with straight-forward defining of the lockinginstance names to the _nolock variants. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24GFS2: Merge lock_dlm module into GFS2Steven Whitehouse
This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Kill two daemons with one patchSteven Whitehouse
This patch removes the two daemons, gfs2_scand and gfs2_glockd and replaces them with a shrinker which is called from the VM. The net result is that GFS2 responds better when there is memory pressure, since it shrinks the glock cache at the same rate as the VFS shrinks the dcache and icache. There are no longer any time based criteria for shrinking glocks, they are kept until such time as the VM asks for more memory and then we demote just as many glocks as required. There are potential future changes to this code, including the possibility of sorting the glocks which are to be written back into inode number order, to get a better I/O ordering. It would be very useful to have an elevator based workqueue implementation for this, as that would automatically deal with the read I/O cases at the same time. This patch is my answer to Andrew Morton's remark, made during the initial review of GFS2, asking why GFS2 needs so many kernel threads, the answer being that it doesn't :-) This patch is a net loss of about 200 lines of code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Banish struct gfs2_dinode_hostSteven Whitehouse
The final field in gfs2_dinode_host was the i_flags field. Thats renamed to i_diskflags in order to avoid confusion with the existing inode flags, and moved into the inode proper at a suitable location to avoid creating a "hole". At that point struct gfs2_dinode_host is no longer needed and as promised (quite some time ago!) it can now be removed completely. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksizeSteven Whitehouse
This patch moved the i_size field from the gfs2_dinode_host and following the ext3 convention renames it i_disksize. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>