summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2012-01-18nfs: fix regression in handling of context= option in NFSv4Jeff Layton
commit 8a0d551a59ac92d8ff048d6cb29d3a02073e81e8 upstream. Setting the security context of a NFSv4 mount via the context= mount option is currently broken. The NFSv4 codepath allocates a parsed options struct, and then parses the mount options to fill it. It eventually calls nfs4_remote_mount which calls security_init_mnt_opts. That clobbers the lsm_opts struct that was populated earlier. This bug also looks like it causes a small memory leak on each v4 mount where context= is used. Fix this by moving the initialization of the lsm_opts into nfs_alloc_parsed_mount_data. Also, add a destructor for nfs_parsed_mount_data to make it easier to free all of the allocations hanging off of it, and to ensure that the security_free_mnt_opts is called whenever security_init_mnt_opts is. I believe this regression was introduced quite some time ago, probably by commit c02d7adf. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-18NFSv4: include bitmap in nfsv4 get acl dataAndy Adamson
commit bf118a342f10dafe44b14451a1392c3254629a1f upstream. The NFSv4 bitmap size is unbounded: a server can return an arbitrary sized bitmap in an FATTR4_WORD0_ACL request. Replace using the nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data xdr length to the (cached) acl page data. This is a general solution to commit e5012d1f "NFSv4.1: update nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-18NFS - fix recent breakage to NFS error handling.NeilBrown
commit 2edb6bc3852c681c0d948245bd55108dc6407604 upstream. From c6d615d2b97fe305cbf123a8751ced859dca1d5e Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.de> Date: Wed, 16 Nov 2011 09:39:05 +1100 Subject: NFS - fix recent breakage to NFS error handling. commit 02c24a82187d5a628c68edfe71ae60dc135cd178 made a small and presumably unintended change to write error handling in NFS. Previously an error from filemap_write_and_wait_range would only be of interest if nfs_file_fsync did not return an error. After this commit, an error from filemap_write_and_wait_range would mean that (the rest of) nfs_file_fsync would not even be called. This means that: 1/ you are more likely to see EIO than e.g. EDQUOT or ENOSPC. 2/ NFS_CONTEXT_ERROR_WRITE remains set for longer so more writes are synchronous. This patch restores previous behaviour. Cc: Josef Bacik <josef@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-18NFSv4.1: fix backchannel slotid off-by-one bugAndy Adamson
commit 61f2e5106582d02f30b6807e3f9c07463c572ccb upstream. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits.Trond Myklebust
commit 111d489f0fb431f4ae85d96851fbf8d3248c09d8 upstream. Currently, the code assumes that the SEQUENCE status bits are mutually exclusive. They are not... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06NFS: Fix a regression in nfs_file_llseek()Trond Myklebust
commit 6c52961743f38747401b47127b82159ab6d8a7a4 upstream. After commit 06222e491e663dac939f04b125c9dc52126a75c4 (fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek) the behaviour of llseek() was changed so that it always revalidates the file size. The bug appears to be due to a logic error in the afore-mentioned commit, which always evaluates to 'true'. Reported-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26nfs: when attempting to open a directory, fall back on normal lookup (try #5)Jeff Layton
commit 1788ea6e3b2a58cf4fb00206e362d9caff8d86a7 upstream. commit d953126 changed how nfs_atomic_lookup handles an -EISDIR return from an OPEN call. Prior to that patch, that caused the client to fall back to doing a normal lookup. When that patch went in, the code began returning that error to userspace. The d_revalidate codepath however never had the corresponding change, so it was still possible to end up with a NULL ctx->state pointer after that. That patch caused a regression. When we attempt to open a directory that does not have a cached dentry, that open now errors out with EISDIR. If you attempt the same open with a cached dentry, it will succeed. Fix this by reverting the change in nfs_atomic_lookup and allowing attempts to open directories to fall back to a normal lookup Also, add a NFSv4-specific f_ops->open routine that just returns -ENOTDIR. This should never be called if things are working properly, but if it ever is, then the dprintk may help in debugging. To facilitate this, a new file_operations field is also added to the nfs_rpc_ops struct. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11nfs: don't try to migrate pages with active requestsJeff Layton
commit 2da956523526e440ef4f4dd174e26f5ac06fe011 upstream. nfs_find_and_lock_request will take a reference to the nfs_page and will then put it if the req is already locked. It's possible though that the reference will be the last one. That put then can kick off a whole series of reference puts: nfs_page nfs_open_context dentry inode If the inode ends up being deleted, then the VFS will call truncate_inode_pages. That function will try to take the page lock, but it was already locked when migrate_page was called. The code deadlocks. Fix this by simply refusing the migration request if PagePrivate is already set, indicating that the page is already associated with an active read or write request. We've had a customer test a backported version of this patch and the preliminary results seem good. Cc: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11nfs: don't redirty inode when ncommit == 0 in nfs_commit_unstable_pagesJeff Layton
commit 3236c3e1adc0c7ec83eaff1de2d06746b7c5bb28 upstream. commit 420e3646 allowed the kernel to reduce the number of unnecessary commit calls by skipping the commit when there are a large number of outstanding pages. However, the current test in nfs_commit_unstable_pages does not handle the edge condition properly. When ncommit == 0, then that means that the kernel doesn't need to do anything more for the inode. The current test though in the WB_SYNC_NONE case will return true, and the inode will end up being marked dirty. Once that happens the inode will never be clean until there's a WB_SYNC_ALL flush. Fix this by immediately returning from nfs_commit_unstable_pages when ncommit == 0. Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which has a backported version of 420e3646. The inode cache there was growing very large. The inode cache was unable to be shrunk since the inodes were all marked dirty. Calling sync() would essentially "fix" the problem -- the WB_SYNC_ALL flush would result in the inodes all being marked clean. What I'm not clear on is how big a problem this is in mainline kernels as the writeback code there is very different. Either way, it seems incorrect to re-mark the inode dirty in this case. Reported-by: Mike McLean <mikem@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11SUNRPC/NFS: make rpc pipe upcall genericPeng Tao
commit c1225158a8dad9e9d5eee8a17dbbd9c7cda05ab9 upstream. The same function is used by idmap, gss and blocklayout code. Make it generic. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11Revert "NFS: Ensure that writeback_single_inode() calls write_inode() when ↵Trond Myklebust
syncing" commit 59b7c05fffba030e5d9e72324691e2f99aa69b79 upstream. This reverts commit b80c3cb628f0ebc241b02e38dd028969fb8026a2. The reverted commit was rendered obsolete by a VFS fix: commit 5547e8aac6f71505d621a612de2fca0dd988b439 (writeback: Update dirty flags in two steps). We now no longer need to worry about writeback_single_inode() missing our marking the inode for COMMIT in 'do_writepages()' call. Reverting this patch, fixes a performance regression in which the inode would continuously get queued to the dirty list, causing the writeback code to unnecessarily try to send a COMMIT. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11pnfsblock: fix writeback deadlockPeng Tao
commit 7542274519b3ba87555410c66e8356ac1e3bc9b3 upstream. We should check if the sector is already initialized before trying to grab the page from page cache. Otherwise when two pages of the same block are written back by two threads each calling from writepage_locked, it can cause deadlock like bellow. [ 1080.972099] INFO: task kswapd0:25 blocked for more than 120 seconds. [ 1080.972377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1080.972812] kswapd0 D ffff88000c4926c0 0 25 2 0x00000000 [ 1080.972816] ffff88000df276b0 0000000000000046 ffff88000df27640 ffffffff81013ba7 [ 1080.972821] ffff88000c492310 ffff88000df27fd8 ffff88000df27fd8 00000000001d3440 [ 1080.972824] ffff88000c378000 ffff88000c492310 ffff8800175d3d40 ffff880017fc75a8 [ 1080.972828] Call Trace: [ 1080.972860] [<ffffffff81013ba7>] ? read_tsc+0x9/0x19 [ 1080.972877] [<ffffffff810e0b23>] ? lock_page+0x2b/0x2b [ 1080.972899] [<ffffffff81475a1d>] io_schedule+0x63/0x7e [ 1080.972902] [<ffffffff810e0b31>] sleep_on_page+0xe/0x12 [ 1080.972905] [<ffffffff81475fe8>] __wait_on_bit_lock+0x46/0x8f [ 1080.972916] [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72 [ 1080.972919] [<ffffffff810e0af6>] __lock_page+0x66/0x68 [ 1080.972928] [<ffffffff81072705>] ? autoremove_wake_function+0x3d/0x3d [ 1080.972932] [<ffffffff810e0b1f>] lock_page+0x27/0x2b [ 1080.972934] [<ffffffff810e0bcf>] find_lock_page+0x34/0x57 [ 1080.972937] [<ffffffff810e1738>] find_or_create_page+0x34/0x8a [ 1080.972947] [<ffffffffa034245b>] bl_write_pagelist+0x205/0x6da [blocklayoutdriver] [ 1080.972951] [<ffffffffa034145d>] ? bl_free_lseg+0x38/0x38 [blocklayoutdriver] [ 1080.972995] [<ffffffffa02e27b9>] ? nfs_write_rpcsetup+0x118/0x123 [nfs] [ 1080.973033] [<ffffffffa030246b>] pnfs_generic_pg_writepages+0x10b/0x1f4 [nfs] [ 1080.973089] [<ffffffffa02deaae>] nfs_pageio_doio+0x1a/0x43 [nfs] [ 1080.973098] [<ffffffffa02df035>] nfs_pageio_complete+0x16/0x2d [nfs] [ 1080.973108] [<ffffffffa02e2d8f>] nfs_writepage_locked+0xa0/0xbf [nfs] [ 1080.973119] [<ffffffffa02e36a1>] nfs_writepage+0x16/0x2b [nfs] [ 1080.973122] [<ffffffff810e8762>] ? clear_page_dirty_for_io+0x87/0x9a [ 1080.973133] [<ffffffff810efc5b>] shrink_page_list+0x39b/0x6c8 [ 1080.973139] [<ffffffff810f03bb>] shrink_inactive_list+0x22c/0x39e [ 1080.973144] [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72 [ 1080.973148] [<ffffffff810f0c33>] shrink_zone+0x445/0x588 [ 1080.973152] [<ffffffff810f1a11>] balance_pgdat+0x2c2/0x56b [ 1080.973170] [<ffffffff81254208>] ? __bitmap_weight+0x34/0x80 [ 1080.973175] [<ffffffff810f1f78>] kswapd+0x2be/0x2fa [ 1080.973179] [<ffffffff810726c8>] ? __init_waitqueue_head+0x4b/0x4b [ 1080.973183] [<ffffffff810f1cba>] ? balance_pgdat+0x56b/0x56b [ 1080.973187] [<ffffffff81071f69>] kthread+0xa8/0xb0 [ 1080.973200] [<ffffffff814806b4>] kernel_thread_helper+0x4/0x10 [ 1080.973205] [<ffffffff81071ec1>] ? __init_kthread_worker+0x5a/0x5a [ 1080.973210] [<ffffffff814806b0>] ? gs_change+0x13/0x13 [ 1080.973213] no locks held by kswapd0/25. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11pnfsblock: fix NULL pointer dereferencePeng Tao
commit e6d05a757c314ad88d0649d3835a8a1daa964236 upstream. bl_add_page_to_bio returns error pointer. bio should be reset to NULL in failure cases as the out path always calls bl_submit_bio. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11pnfs: recoalesce when ld read pagelist failsPeng Tao
commit 9b7eecdcfeb943f130d86bbc249fde4994b6fe30 upstream. For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to mds. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11pnfs: recoalesce when ld write pagelist failsPeng Tao
commit 8ce160c5ef06cc89c2b6b26bfa5ef7a5ce2c93e0 upstream. For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to mds. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11pnfs: make _set_lo_fail genericPeng Tao
commit 1b0ae068779874f54b55aac3a2a992bcf3f2c3c4 upstream. file layout and block layout both use it to set mark layout io failure bit. So make it generic. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11pnfsblock: add missing rpc_put_mount and path_putPeng Tao
commit 760383f1ee4d14b0e0bdf0cddee648d9b8633429 upstream. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11pnfsblock: fix size of upcall messageJim Rees
commit fdc17abbc4b6094b34ee8ff5d91eaba8637594a2 upstream. Make the status field explicitly 32 bits. "...it's unlikely that the kernel and userspace would differ on the size of an int here, but it might be a good idea to go ahead and make that explicitly 32 bits in case we end up dealing with more exotic arches at some point in the future." Suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11pnfsblock: fix return code confusionJim Rees
commit 516f2e24faa7548a61d9ba790958528469c2e284 upstream. Always return PTR_ERR, not NULL, from nfs4_blk_get_deviceinfo and nfs4_blk_decode_device. Check for IS_ERR, not NULL, in bl_set_layoutdriver when calling nfs4_blk_get_deviceinfo. Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-09-26VFS: Fix the remaining automounter semantics regressionsTrond Myklebust
The concensus seems to be that system calls such as stat() etc should not trigger an automount. Neither should the l* versions. This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups that _should_ trigger an automount on the last path element. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [ Edited to leave out the cases that are already covered by LOOKUP_OPEN, LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally force automounting for their own reasons - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-13nfs: Do not allow multiple mounts on same mountpoint when using -o noacSachin Prabhu
Do not allow multiple mounts on same mountpoint when using -o noac When you normally attempt to mount a share twice on the same mountpoint, a check in do_add_mount causes it to return an error # mount localhost:/nfsv3 /mnt # mount localhost:/nfsv3 /mnt mount.nfs: /mnt is already mounted or busy However when using the option 'noac', the user is able to mount the same share on the same mountpoint multiple times. This happens because a share mounted with the noac option is automatically assigned the 'sync' flag MS_SYNCHRONOUS in nfs_initialise_sb(). This flag is set after the check for already existing superblocks is done in sget(). The check for the mount flags in nfs_compare_mount_options() does not take into account the 'sync' flag applied later on in the code path. This means that when using 'noac', a new superblock structure is assigned for every new mount of the same share and multiple shares on the same mountpoint are allowed. ie. # mount -onoac localhost:/nfsv3 /mnt can be run multiple times. The patch checks for noac and assigns the sync flag before sget() is called to obtain an already existing superblock structure. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-09-13NFS: Fix a typo in nfs_flush_multiTrond Myklebust
Fix a typo which causes an Oops in the RPC layer, when using wsize < 4k. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Sricharan R <r.sricharan@ti.com>
2011-08-24NFSv4: renewd needs to be able to handle the NFS4ERR_CB_PATH_DOWN errorTrond Myklebust
The NFSv4 spec does not specify that the server must repeat that error, so in order to avoid having the delegations revoked, we should handle it immediately. Also note that NFS4ERR_CB_PATH_DOWN does in fact renew the lease... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegationTrond Myklebust
RFC3530 states that if the client holds a delegation, then it is obliged to continue to send RENEW calls once every lease period in order to allow the server to return NFS4ERR_CB_PATH_DOWN if the callback path is unreachable. This is not required for NFSv4.1, since the server can at any time set the SEQ4_STATUS_CB_PATH_DOWN_SESSION in any SEQUENCE operation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24NFSv4: nfs4_proc_renew should be declared staticTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24NFSv4: nfs4_proc_async_renew should use a GFP_NOFS allocationTrond Myklebust
We shouldn't allow the renew daemon to do direct reclaim on the NFS partition. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-18Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets NFSv4.1: Fix the callback 'highest_used_slotid' behaviour pnfs-obj: Fix the comp_index != 0 case pnfs-obj: Bug when we are running out of bio nfs: add missing prefetch.h include
2011-08-11pnfs: Automatically select blocks & objects layoutsBoaz Harrosh
Just like files-layout, blocks & objects layouts are part of the NFS 4.1 protocol and should be automatically selected if NFS_4_1 is selected. The small problem is that these depend on other Kernel support being present, while files only depends on NFS itself. This patch removes from the user choice the presence of objects and blocks layout. But makes sure these are selected only if the depended subsystems are present in the Kernel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11NFS41: make PNFS_BLOCK selectablePeng Tao
PNFS_BLOCK needs BLK_DEV_DM/MD, which is not a dependency for other pnfs layout drivers. Seperate it out so others can still build when BLK_DEV_DM/MD is not enabled. Also change select to depends on to avoid build failures. Reported-and-tested-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Peng Tao <peng_tao@emc.com> Acked-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resetsTrond Myklebust
If the client is in the process of resetting the session when it receives a callback, then returning NFS4ERR_DELAY may cause a deadlock with the DESTROY_SESSION call. Basically, if the client returns NFS4ERR_DELAY in response to the CB_SEQUENCE call, then the server is entitled to believe that the client is busy because it is already processing that call. In that case, the server is perfectly entitled to respond with a NFS4ERR_BACK_CHAN_BUSY to any DESTROY_SESSION call. Fix this by having the client reply with a NFS4ERR_BADSESSION in response to the callback if it is resetting the session. Cc: stable@kernel.org [2.6.38+] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04NFSv4.1: Fix the callback 'highest_used_slotid' behaviourTrond Myklebust
Currently, there is no guarantee that we will call nfs4_cb_take_slot() even though nfs4_callback_compound() will consistently call nfs4_cb_free_slot() provided the cb_process_state has set the 'clp' field. The result is that we can trigger the BUG_ON() upon the next call to nfs4_cb_take_slot(). This patch fixes the above problem by using the slot id that was taken in the CB_SEQUENCE operation as a flag for whether or not we need to call nfs4_cb_free_slot(). It also fixes an atomicity problem: we need to set tbl->highest_used_slotid atomically with the check for NFS4_SESSION_DRAINING, otherwise we end up racing with the various tests in nfs4_begin_drain_session(). Cc: stable@kernel.org [2.6.38+] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04pnfs-obj: Fix the comp_index != 0 caseBoaz Harrosh
There were bugs in the case of partial layout where olo_comp_index is not zero. This used to work and was tested but one of the later cleanup SQUASHMEs broke it and was not tested since. Also add a dprint that specify those received layout parameters. Everything else was already printed. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04pnfs-obj: Bug when we are running out of bioBoaz Harrosh
When we have a situation that the number of pages we want to encode is bigger then the size of the bio. (Which can currently happen only when all IO is going to a single device .e.g group_width==1) then the IO is submitted short and we report back only the amount of bytes we actually wrote/read and all is fine. BUT ... There was a bug that the current length counter was advanced before the fail to add the extra page, and we come to a situation that the CDB length was one-page longer then the actual bio size, which is of course rejected by the osd-target. While here also fix the bio size calculation, in the case that we received more then one group of devices. CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04nfs: add missing prefetch.h includeHeiko Carstens
Fix this compile error on s390: CC [M] fs/nfs/blocklayout/blocklayout.o fs/nfs/blocklayout/blocklayout.c: In function 'bl_end_io_read': fs/nfs/blocklayout/blocklayout.c:201:4: error: implicit declaration of function 'prefetchw' Introduced with 9549ec01 "pnfsblock: bl_read_pagelist". Cc: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree() VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree() switch posix_acl_chmod() to umode_t switch posix_acl_from_mode() to umode_t switch posix_acl_equiv_mode() to umode_t * switch posix_acl_create() to umode_t * block: initialise bd_super in bdget() vfs: avoid call to inode_lru_list_del() if possible vfs: avoid taking inode_hash_lock on pipes and sockets vfs: conditionally call inode_wb_list_del() VFS: Fix automount for negative autofs dentries Btrfs: load the key from the dir item in readdir into a fake dentry devtmpfs: missing initialialization in never-hit case hppfs: missing include
2011-08-01switch posix_acl_create() to umode_t *Al Viro
so we can pass &inode->i_mode to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-31NFS: Re-enable compilation of nfs with !CONFIG_NFS_V4 || !CONFIG_NFS_V4_1Trond Myklebust
Fix two recently introduced compile problems: Fix a typo in fs/nfs/pnfs.h Move the pnfs_blksize declaration outside the CONFIG_NFS_V4 section in struct nfs_server. Reported-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-31Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
* 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) pnfsblock: write_pagelist handle zero invalid extents pnfsblock: note written INVAL areas for layoutcommit pnfsblock: bl_write_pagelist pnfsblock: bl_read_pagelist pnfsblock: cleanup_layoutcommit pnfsblock: encode_layoutcommit pnfsblock: merge rw extents pnfsblock: add extent manipulation functions pnfsblock: bl_find_get_extent pnfsblock: xdr decode pnfs_block_layout4 pnfsblock: call and parse getdevicelist pnfsblock: merge extents pnfsblock: lseg alloc and free pnfsblock: remove device operations pnfsblock: add device operations pnfsblock: basic extent code pnfsblock: use pageio_ops api pnfsblock: add blocklayout Kconfig option, Makefile, and stubs pnfs: cleanup_layoutcommit pnfs: ask for layout_blksize and save it in nfs_server ...
2011-07-31pnfsblock: write_pagelist handle zero invalid extentsPeng Tao
For invalid extents, find other pages in the same fsblock and write them out. [pnfsblock: write_begin] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: note written INVAL areas for layoutcommitFred Isaman
Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: bl_write_pagelistFred Isaman
Note: When upper layer's read/write request cannot be fulfilled, the block layout driver shouldn't silently mark the page as error. It should do what can be done and leave the rest to the upper layer. To do so, we should set rdata/wdata->res.count properly. When upper layer re-send the read/write request to finish the rest part of the request, pgbase is the position where we should start at. [pnfsblock: bl_write_pagelist support functions] [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: Zhang Jingwang <yyalone@gmail.com> [pnfs-block: use new write_pagelist api] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> [SQUASHME: pnfsblock: mds_offset is set in the generic layer] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED] Signed-off-by: Peng Tao <peng_tao@emc.com> [pnfsblock: SQUASHME: adjust to API change] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fixup blksize alignment in bl_setup_layoutcommit] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: Zhang Jingwang <yyalone@gmail.com> [pnfs-block: use new write_pagelist api] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: bl_read_pagelistFred Isaman
Note: When upper layer's read/write request cannot be fulfilled, the block layout driver shouldn't silently mark the page as error. It should do what can be done and leave the rest to the upper layer. To do so, we should set rdata/wdata->res.count properly. When upper layer re-send the read/write request to finish the rest part of the request, pgbase is the position where we should start at. [pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED] Signed-off-by: Peng Tao <peng_tao@emc.com> [pnfsblock: read path error handling] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: handle errors when read or write pagelist.] Signed-off-by: Zhang Jingwang <yyalone@gmail.com> [pnfs-block: use new read_pagelist api] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: cleanup_layoutcommitFred Isaman
In blocklayout driver. There are two things happening while layoutcommit/cleanup. 1. the modified extents are encoded. 2. On cleanup the extents are put back on the layout rw extents list, for reads. In the new system where actual xdr encoding is done in encode_layoutcommit() directly into xdr buffer, these are the new commit stages: 1. On setup_layoutcommit, the range is adjusted as before and a structure is allocated for communication with bl_encode_layoutcommit && bl_cleanup_layoutcommit (Generic layer provides a void-star to hang it on) 2. bl_encode_layoutcommit is called to do the actual encoding directly into xdr. The commit-extent-list is not freed and is stored on above structure. FIXME: The code is not yet converted to the new XDR cleanup 3. On cleanup the commit-extent-list is put back by a call to set_to_rw() as before, but with no need for XDR decoding of the list as before. And the commit-extent-list is freed. Finally allocated structure is freed. [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] Signed-off-by: Jim Rees <rees@umich.edu> [pnfsblock: introduce bl_committing list] Signed-off-by: Peng Tao <peng_tao@emc.com> [pnfsblock: SQUASHME: adjust to API change] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [blocklayout: encode_layoutcommit implementation] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [pnfsblock: fix bug setting up layoutcommit.] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> [pnfsblock: cleanup_layoutcommit wants a status parameter] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: encode_layoutcommitFred Isaman
In blocklayout driver. There are two things happening while layoutcommit/cleanup. 1. the modified extents are encoded. 2. On cleanup the extents are put back on the layout rw extents list, for reads. In the new system where actual xdr encoding is done in encode_layoutcommit() directly into xdr buffer, these are the new commit stages: 1. On setup_layoutcommit, the range is adjusted as before and a structure is allocated for communication with bl_encode_layoutcommit && bl_cleanup_layoutcommit (Generic layer provides a void-star to hang it on) 2. bl_encode_layoutcommit is called to do the actual encoding directly into xdr. The commit-extent-list is not freed and is stored on above structure. FIXME: The code is not yet converted to the new XDR cleanup 3. On cleanup the commit-extent-list is put back by a call to set_to_rw() as before, but with no need for XDR decoding of the list as before. And the commit-extent-list is freed. Finally allocated structure is freed. [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [blocklayout: encode_layoutcommit implementation] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [pnfsblock: fix bug setting up layoutcommit.] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> [pnfsblock: prevent commit list corruption] [pnfsblock: fix layoutcommit with an empty opaque] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: merge rw extentsFred Isaman
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: add extent manipulation functionsFred Isaman
Adds working implementations of various support functions to handle INVAL extents, needed by writes, such as bl_mark_sectors_init and bl_is_sector_init. [pnfsblock: fix 64-bit compiler warnings for extent manipulation] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [Implement release_inval_marks] Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: bl_find_get_extentFred Isaman
Implement bl_find_get_extent(), one of the core extent manipulation routines. [pnfsblock: Lookup list entry of layouts and tags in reverse order] Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> pnfsblock: fix print format warnings for sector_t and size_t gcc spews warnings about these on x86_64, e.g.: fs/nfs/blocklayout/blocklayout.c:74: warning: format ‘%Lu’ expects type ‘long long unsigned int’, but argument 2 has type ‘sector_t’ fs/nfs/blocklayout/blocklayout.c:388: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’ Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: xdr decode pnfs_block_layout4Fred Isaman
XDR decodes the block layout payload sent in LAYOUTGET result, storing the result in an extent list. [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix bug getting pnfs_layout_type in translate_devid().] Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: call and parse getdevicelistFred Isaman
Call GETDEVICELIST during mount, then call and parse GETDEVICEINFO for each device returned. [pnfsblock: get rid of deprecated xdr macros] Signed-off-by: Jim Rees <rees@umich.edu> [pnfsblock: fix pnfs_deviceid references] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: fix print format warnings for sector_t and size_t] [pnfs-block: #include <linux/vmalloc.h>] [pnfsblock: no PNFS_NFS_SERVER] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfsblock: fix bug determining size of striped volume] [pnfsblock: fix oops when using multiple devices] Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [pnfsblock: get rid of vmap and deviceid->area structure] Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfsblock: merge extentsFred Isaman
Replace a stub, so that extents underlying the layouts are properly added, merged, or ignored as necessary. Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> [pnfsblock: delete the new node before put it] Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>