summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2013-01-05nfs: avoid dereferencing null pointer in initiate_bulk_drainingNickolai Zeldovich
Fix an inverted null pointer check in initiate_bulk_draining(). Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.7]
2013-01-04NFS: Ensure that we free the rpc_task after read and write cleanups are doneTrond Myklebust
This patch ensures that we free the rpc_task after the cleanup callbacks are done in order to avoid a deadlock problem that can be triggered if the callback needs to wait for another workqueue item to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org [>= 3.5]
2013-01-04nfs: fix null checking in nfs_get_option_str()Xi Wang
The following null pointer check is broken. *option = match_strdup(args); return !option; The pointer `option' must be non-null, and thus `!option' is always false. Use `!*option' instead. The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some cut-and-paste code."). Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-01-04pnfs: Increase the refcount when LAYOUTGET fails the first timeYanchuan Nian
The layout will be set unusable if LAYOUTGET fails. Is it reasonable to increase the refcount iff LAYOUTGET fails the first time? Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.7]
2013-01-03NFS: Fix access to suid/sgid executablesWeston Andros Adamson
nfs_open_permission_mask() should only check MAY_EXEC for files that are opened with __FMODE_EXEC. Also fix NFSv4 access-in-open path in a similar way -- openflags must be used because fmode will not always have FMODE_EXEC set. This patch fixes https://bugzilla.kernel.org/show_bug.cgi?id=49101 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-12-21NFS: Kill fscache warnings when mounting without -ofscTrond Myklebust
The fscache code will currently bleat a "non-unique superblock keys" warning even if the user is mounting without the 'fsc' option. There should be no reason to even initialise the superblock cache cookie unless we're planning on using fscache for something, so ensure that we check for the NFS_OPTION_FSCACHE flag before calling into the fscache code. Reported-by: Paweł Sikora <pawel.sikora@agmk.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: David Howells <dhowells@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-21NFS: Provide stub nfs_fscache_wait_on_invalidate() for when CONFIG_NFS_FSCACHE=nDavid Howells
Provide a stub nfs_fscache_wait_on_invalidate() function for when CONFIG_NFS_FSCACHE=n lest the following error appear: fs/nfs/inode.c: In function 'nfs_invalidate_mapping': fs/nfs/inode.c:887:2: error: implicit declaration of function 'nfs_fscache_wait_on_invalidate' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-20NFS4: Open files for fscachingDavid Howells
nfs4_file_open() should open files for fscaching. Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a pageDavid Howells
nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably leading to the following bad-page-state: BUG: Bad page state in process python-bin pfn:17d39b page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null) index:38686 (Tainted: G B ---------------- ) Pid: 31053, comm: python-bin Tainted: G B ---------------- 2.6.32-71.24.1.el6.x86_64 #1 Call Trace: [<ffffffff8111bfe7>] bad_page+0x107/0x160 [<ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220 [<ffffffff8111ef19>] __pagevec_free+0x59/0xb0 [<ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130 [<ffffffff8112230c>] release_pages+0x21c/0x250 [<ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0 [<ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70 [<ffffffff81122687>] ____pagevec_lru_add+0x167/0x180 [<ffffffff811226f8>] __lru_cache_add+0x58/0x70 [<ffffffff81122731>] lru_cache_add_lru+0x21/0x40 [<ffffffff81123f49>] putback_lru_page+0x69/0x100 [<ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0 [<ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180 [<ffffffff81152ab0>] ? compaction_alloc+0x0/0x370 [<ffffffff8115255c>] compact_zone+0x4cc/0x600 [<ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820 [<ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0 [<ffffffff8115290e>] compact_zone_order+0x7e/0xb0 [<ffffffff81152a49>] try_to_compact_pages+0x109/0x170 [<ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850 [<ffffffff814c9136>] ? thread_return+0x4e/0x778 [<ffffffff81150d43>] alloc_pages_vma+0x93/0x150 [<ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340 [<ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30 [<ffffffff81136755>] handle_mm_fault+0x245/0x2b0 [<ffffffff814ce383>] do_page_fault+0x123/0x3a0 [<ffffffff814cbdf5>] page_fault+0x25/0x30 nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait - even if __GFP_WAIT is set. The reason that doesn't wait is that fscache_maybe_release_page() might deadlock the allocator as the work threads writing to the cache may all end up sleeping on memory allocation. However, I wonder if that is actually a problem. There are a number of things I can do to deal with this: (1) Make nfs_migrate_page() wait. (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag. (3) Set a timeout around the wait. (4) Make nfs_migrate_page() return an error if the page is still busy. For the moment, I'll select (2) and (4). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2012-12-20NFS: Use FS-Cache invalidationDavid Howells
Use the new FS-Cache invalidation facility from NFS to deal with foreign changes being detected on the server rather than attempting to retire the old cookie and get a new one. The problem with the old method was that NFS did not wait for all outstanding storage and retrieval ops on the cache to complete. There was no automatic wait between the calls to ->readpages() and calls to invalidate_inode_pages2() as the latter can only wait on locked pages that have been added to the pagecache (which they haven't yet on entry to ->readpages()). This was leading to oopses like the one below when an outstanding read got cut off from its cookie by a premature release. BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache] PGD 15889067 PUD 15890067 PMD 0 Oops: 0000 [#1] SMP CPU 0 Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064 /DG965RY RIP: 0010:[<ffffffffa0075118>] [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache] RSP: 0018:ffff8800158799e8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0 RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90 RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002 R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4 R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8 FS: 00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040) Stack: ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508 ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8 ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040 Call Trace: [<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs] [<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs] [<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662 [<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs] [<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267 [<ffffffff81098d7b>] ra_submit+0x1c/0x20 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs] [<ffffffff810c22c4>] do_sync_read+0xba/0xfa [<ffffffff810a62c9>] ? might_fault+0x4e/0x9e [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a [<ffffffff810c2a79>] sys_read+0x45/0x6c [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b Reported-by: Mark Moseley <moseleymark@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-18Merge tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Features include: - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code. Remove altogether where possible, and replace with WARN_ON_ONCE and appropriate error returns where not. - NFSv4.1 client adds session dynamic slot table management. There is matching server side code that has been submitted to Bruce for consideration. Together, this code allows the server to dynamically manage the amount of memory it allocates to the duplicate request cache for each client. It will constantly resize those caches to reserve more memory for clients that are hot while shrinking caches for those that are quiescent. In addition, there are assorted bugfixes for the generic NFS write code, fixes to deal with the drop_nlink() warnings, and yet another fix for NFSv4 getacl." * tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (106 commits) SUNRPC: continue run over clients list on PipeFS event instead of break NFS: Don't use SetPageError in the NFS writeback code SUNRPC: variable 'svsk' is unused in function bc_send_request SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket NFSv4.1: Deal effectively with interrupted RPC calls. NFSv4.1: Move the RPC timestamp out of the slot. NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED. NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0 NFS: Fix calls to drop_nlink() NFS: Ensure that we always drop inodes that have been marked as stale nfs: Remove unused list nfs4_clientid_list nfs: Remove duplicate function declaration in internal.h NFS: avoid NULL dereference in nfs_destroy_server SUNRPC handle EKEYEXPIRED in call_refreshresult SUNRPC set gss gc_expiry to full lifetime nfs: fix page dirtying in NFS DIO read codepath nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ NFSv4.1: Be conservative about the client highest slotid NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly nfs: don't extend writes to cover entire page if pagecache is invalid ...
2012-12-17lseek: the "whence" argument is called "whence"Andrew Morton
But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "A quiet cycle for the security subsystem with just a few maintenance updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: Smack: create a sysfs mount point for smackfs Smack: use select not depends in Kconfig Yama: remove locking from delete path Yama: add RCU to drop read locking drivers/char/tpm: remove tasklet and cleanup KEYS: Use keyring_alloc() to create special keyrings KEYS: Reduce initial permissions on keys KEYS: Make the session and process keyrings per-thread seccomp: Make syscall skipping and nr changes more consistent key: Fix resource leak keys: Fix unreachable code KEYS: Add payload preparsing opportunity prior to key instantiate or update
2012-12-15NFS: Don't use SetPageError in the NFS writeback codeTrond Myklebust
The writeback code is already capable of passing errors back to user space by means of the open_context->error. In the case of ENOSPC, Neil Brown is reporting seeing 2 errors being returned. Neil writes: "e.g. if /mnt2/ if an nfs mounted filesystem that has no space then strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1 reported Input/output error and the relevant parts of the strace output are: write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 fsync(1) = -1 EIO (Input/output error) close(1) = -1 ENOSPC (No space left on device)" Neil then shows that the duplication of error messages appears to be due to the use of the PageError() mechanism, which causes filemap_fdatawait_range to return the extra EIO. The regression was introduced by commit 7b281ee026552f10862b617a2a51acf49c829554 (NFS: fsync() must exit with an error if page writeback failed). Fix this by removing the call to SetPageError(), and just relying on open_context->error reporting the ENOSPC back to fsync(). Reported-by: Neil Brown <neilb@suse.de> Tested-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [3.6+]
2012-12-15NFSv4.1: Deal effectively with interrupted RPC calls.Trond Myklebust
If an RPC call is interrupted, assume that the server hasn't processed the RPC call so that the next time we use the slot, we know that if we get a NFS4ERR_SEQ_MISORDERED or NFS4ERR_SEQ_FALSE_RETRY, we just have to bump the sequence number. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15NFSv4.1: Move the RPC timestamp out of the slot.Trond Myklebust
Shave a few bytes off the slot table size by moving the RPC timestamp into the sequence results. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED.Trond Myklebust
If the server returns NFS4ERR_SEQ_MISORDERED, it could be a sign that the slot was retired at some point. Retry the attempt after reinitialising the slot sequence number to 1. Also add a handler for NFS4ERR_SEQ_FALSE_RETRY. Just bump the slot sequence number and retry... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-14NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0Trond Myklebust
If the inode has no links, then we should force a new lookup. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-14NFS: Fix calls to drop_nlink()Trond Myklebust
It is almost always wrong for NFS to call drop_nlink() after removing a file. What we really want is to mark the inode's attributes for revalidation, and we want to ensure that the VFS drops it if we're reasonably sure that this is the final unlink(). Do the former using the usual cache validity flags, and the latter by testing if inode->i_nlink == 1, and clearing it in that case. This also fixes the following warning reported by Neil Brown and Jeff Layton (among others). [634155.004438] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442] Hardware name: Latitude E6510 [634155.004577] crc_itu_t crc32c_intel snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm: bash Tainted: G W 3.5.0-36-desktop # [634155.004611] Call Trace: [634155.004630] [<ffffffff8100444a>] dump_trace+0xaa/0x2b0 [634155.004641] [<ffffffff815a23dc>] dump_stack+0x69/0x6f [634155.004653] [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0 [634155.004662] [<ffffffff811832e4>] drop_nlink+0x34/0x40 [634155.004687] [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs] [634155.004714] [<ffffffff8118049e>] dput+0x12e/0x230 [634155.004726] [<ffffffff8116b230>] __fput+0x170/0x230 [634155.004735] [<ffffffff81167c0f>] filp_close+0x5f/0x90 [634155.004743] [<ffffffff81167cd7>] sys_close+0x97/0x100 [634155.004754] [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b [634155.004767] [<00007f2a73a0d110>] 0x7f2a73a0d10f Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [3.3+]
2012-12-14NFS: Ensure that we always drop inodes that have been marked as staleTrond Myklebust
There is no need to cache stale inodes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-13nfs: Remove unused list nfs4_clientid_listYanchuan Nian
This list was designed to store struct nfs4_client in the client side. But nfs4_client was obsolete and has been removed from the source code. So remove the unused list. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-13nfs: Remove duplicate function declaration in internal.hYanchuan Nian
Remove duplicate function declaration in internal.h Signed-off-by: Yanchuan Nian <ycnian@gmail.com> [Trond: Added nfs_pageio_init_read, which suffered from the same problem] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12NFS: avoid NULL dereference in nfs_destroy_serverNeilBrown
In rare circumstances, nfs_clone_server() of a v2 or v3 server can get an error between setting server->destory (to nfs_destroy_server), and calling nfs_start_lockd (which will set server->nlm_host). If this happens, nfs_clone_server will call nfs_free_server which will call nfs_destroy_server and thence nlmclnt_done(NULL). This causes the NULL to be dereferenced. So add a guard to only call nlmclnt_done() if ->nlm_host is not NULL. The other guards there are irrelevant as nlm_host can only be non-NULL if one of these flags are set - so remove those tests. (Thanks to Trond for this suggestion). This is suitable for any stable kernel since 2.6.25. Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12SUNRPC handle EKEYEXPIRED in call_refreshresultAndy Adamson
Currently, when an RPCSEC_GSS context has expired or is non-existent and the users (Kerberos) credentials have also expired or are non-existent, the client receives the -EKEYEXPIRED error and tries to refresh the context forever. If an application is performing I/O, or other work against the share, the application hangs, and the user is not prompted to refresh/establish their credentials. This can result in a denial of service for other users. Users are expected to manage their Kerberos credential lifetimes to mitigate this issue. Move the -EKEYEXPIRED handling into the RPC layer. Try tk_cred_retry number of times to refresh the gss_context, and then return -EACCES to the application. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12nfs: fix page dirtying in NFS DIO read codepathJeff Layton
The NFS DIO code will dirty pages that catch read responses in order to handle the case where someone is doing DIO reads into an mmapped buffer. The existing code doesn't really do the right thing though since it doesn't take into account the case where we might be attempting to read past the EOF. Fix the logic in that code to only dirty pages that ended up receiving data from the read. Note too that it really doesn't matter if NFS_IOHDR_ERROR is set or not. All that matters is if the page was altered by the read. Cc: Fred Isaman <iisaman@netapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12nfs: don't zero out the rest of the page if we hit the EOF on a DIO READJeff Layton
Eryu provided a test program that would segfault when attempting to read past the EOF on file that was opened O_DIRECT. The buffer given to the read() call was on the stack, and when he attempted to read past it it would scribble over the rest of the stack page. If we hit the end of the file on a DIO READ request, then we don't want to zero out the rest of the buffer. These aren't pagecache pages after all, and there's no guarantee that the buffers that were passed in represent entire pages. Cc: <stable@vger.kernel.org> # v3.5+ Cc: Fred Isaman <iisaman@netapp.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-11NFSv4.1: Be conservative about the client highest slotidTrond Myklebust
If the server sends us a target that looks like an outlier, but is lower than the existing target, then respect it anyway. However defer actually updating the generation counter until we get a target that doesn't look like an outlier. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-11NFSv4.1: Handle NFS4ERR_BADSLOT errors correctlyTrond Myklebust
Most (all) NFS4ERR_BADSLOT errors are due to the client failing to respect the server's sr_highest_slotid limit. This mainly happens due to reordered RPC requests. The way to handle it is simply to drop the slot that we're using, and retry using the new highest_slotid limits. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-11Merge branch 'bugfixes' into nfs-for-nextTrond Myklebust
2012-12-11nfs: don't extend writes to cover entire page if pagecache is invalidJeff Layton
Jian reported that the following sequence would leave "testfile" with corrupt data: # mount localhost:/export /mnt/nfs/ -o vers=3 # echo abc > /mnt/nfs/testfile; echo def >> /export/testfile; echo ghi >> /mnt/nfs/testfile # cat -v /export/testfile abc ^@^@^@^@ghi While there's no locking involved here, the operations are serialized, so CTO should prevent corruption. The first write to the file is fine and writes 4 bytes. The file is then extended on the server. When it's reopened a GETATTR is issued and the size change is noticed. This causes NFS_INO_INVALID_DATA to be set on the file. Because the file is opened for write only, nfs_want_read_modify_write() returns 0 to nfs_write_begin(). nfs_updatepage then calls nfs_write_pageuptodate() to see if it should extend the nfs_page to cover the whole page. NFS_INO_INVALID_DATA is still set on the file at that point, but that flag is ignored and nfs_pageuptodate erroneously extends the write to cover the whole page, with the write done on the server side filled in with zeroes. This patch just has that function check for NFS_INO_INVALID_DATA in addition to NFS_INO_REVAL_PAGECACHE. This fixes the bug, but looking over the code, I wonder if we might have a similar bug in nfs_revalidate_size(). The difference between those two flags is very subtle, so it seems like we ought to be checking for NFS_INO_INVALID_DATA in most of the places that we look for NFS_INO_REVAL_PAGECACHE. I believe this is regression introduced by commit 8d197a568. The code did check for NFS_INO_INVALID_DATA prior to that patch. Original bug report is here: https://bugzilla.redhat.com/show_bug.cgi?id=885743 Cc: <stable@vger.kernel.org> # 3.5+ Reported-by: Jian Li <jiali@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-11NFSv4: Check for buffer length in __nfs4_get_acl_uncachedSven Wegener
Commit 1f1ea6c "NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached" accidently dropped the checking for too small result buffer length. If someone uses getxattr on "system.nfs4_acl" on an NFSv4 mount supporting ACLs, the ACL has not been cached and the buffer suplied is too short, we still copy the complete ACL, resulting in kernel and user space memory corruption. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Try to eliminate outliers when updating target_highest_slotidTrond Myklebust
Look for sudden changes in the first and second derivatives in order to eliminate outlier changes to target_highest_slotid (which are due to out-of-order RPC replies). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Ensure smooth handover of slots from one task to the next waitingTrond Myklebust
Currently, we see a lot of bouncing for the value of highest_used_slotid due to the fact that slots are getting freed, instead of getting instantly transmitted to the next waiting task. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Don't mess with task priorities in nfs41_setup_sequenceTrond Myklebust
We want to preserve the rpc_task priority for things like writebacks, that may have differing levels of urgency. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFS: Remove _nfs_call_sync_sessionBryan Schumaker
All it does is pass its arguments through to another function. Let's cut out the middleman... Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4: Clean up handling of privileged operationsTrond Myklebust
Privileged rpc calls are those that are run by the state recovery thread, in cases where we're trying to recover the system after a server reboot or a network partition. In those cases, we want to fence off all other rpc calls (see nfs4_begin_drain_session()) so that they don't end up using stateids or clientids that are in the process of being recovered. Prior to this patch, we had to set up special callback functions in order to declare an rpc call as being privileged. By adding a new field to the sequence arguments, this patch simplifies things considerably, and allows us to declare the rpc call as privileged before it is run. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Remove the 'FIFO' behaviour for nfs41_setup_sequenceTrond Myklebust
It is more important to preserve the task priority behaviour, which ensures that things like reclaim writes take precedence over background and kupdate writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Clean up nfs41_setup_sequenceTrond Myklebust
Move all the sleep-and-exit cases into a single section of code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4: Simplify the NFSv4/v4.1 synchronous call switchTrond Myklebust
We shouldn't need to pass the 'cache_reply' parameter if we initialise the sequence_args/sequence_res in the caller. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Simplify the sequence setupTrond Myklebust
Nobody calls nfs4_setup_sequence or nfs41_setup_sequence without also calling rpc_call_start() on success. This commit therefore folds the rpc_call_start call into nfs41_setup_sequence(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Use nfs41_setup_sequence where appropriateTrond Myklebust
There is no point in using nfs4_setup_sequence or nfs4_sequence_done in pure NFSv4.1 functions. We already know that those have sessions... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Ping server when our session table limits are too highTrond Myklebust
If the server requests a lower target_highest_slotid, then ensure that we ping it with at least one RPC call containing an appropriate SEQUENCE op. This ensures that the server won't need to send a recall callback in order to shrink the slot table. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Set the maximum slot table size to 1024 slotsTrond Myklebust
This means that we end up statically allocating 128 bytes for the bitmap on each slot table. For a server that supports 1MB write and read I/O sizes this means that we can completely fill the maximum 1GB TCP send/receive windows. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Move slot table and session struct definitions to nfs4session.hTrond Myklebust
Clean up. Gather NFSv4.1 slot definitions in fs/nfs/nfs4session.h. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Cleanup move session slot management to fs/nfs/nfs4session.cTrond Myklebust
NFSv4.1 session management is getting complex enough to deserve a separate file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4: Move nfs4_wait_clnt_recover and nfs4_client_recover_expired_leaseTrond Myklebust
nfs4_wait_clnt_recover and nfs4_client_recover_expired_lease are both generic state related functions. As such, they belong in nfs4state.c, and not nfs4proc.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Clean up session drainingTrond Myklebust
Coalesce nfs4_check_drain_bc_complete and nfs4_check_drain_fc_complete into a single function that can be called when the slot table is known to be empty, then change nfs4_callback_free_slot() and nfs4_free_slot() to use it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: If slot allocation fails due to OOM, retry more quicklyTrond Myklebust
If the NFSv4.1 session slot allocation fails due to an ENOMEM condition, then set the task->tk_timeout to 1/4 second to ensure that we do retry the slot allocation more quickly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: CB_RECALL_SLOT must schedule a sequence op after updating targetsTrond Myklebust
RFC5661 requires us to make sure that the server knows we've updated our slot table size by sending at least one SEQUENCE op containing the new 'highest_slotid' value. We can do so using the 'CHECK_LEASE' functionality of the state manager. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06NFSv4.1: Remove the state manager code to resize the slot tableTrond Myklebust
The state manager no longer needs any special machinery to stop the session flow and resize the slot table. It is all done on the fly by the SEQUENCE op code now. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>