summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-11-24ext4: fix build error when DX_DEBUG is definedGabriel Krisman Bertazi
[ Upstream commit 799578ab16e86b074c184ec5abbda0bc698c7b0b ] Enabling DX_DEBUG triggers the build error below. info is an attribute of the dxroot structure. linux/fs/ext4/namei.c:2264:12: error: ‘info’ undeclared (first use in this function); did you mean ‘insl’? info->indirect_levels)); Fixes: e08ac99fa2a2 ("ext4: add largedir feature") Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-24f2fs: return correct errno in f2fs_gcJaegeuk Kim
[ Upstream commit 61f7725aa148ee870436a29d3a24d5c00ab7e9af ] This fixes overriding error number in f2fs_gc. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20fuse: use READ_ONCE on congestion_threshold and max_backgroundKirill Tkhai
[ Upstream commit 2a23f2b8adbe4bd584f936f7ac17a99750eed9d7 ] Since they are of unsigned int type, it's allowed to read them unlocked during reporting to userspace. Let's underline this fact with READ_ONCE() macroses. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20f2fs: mark inode dirty explicitly in recover_inode()Chao Yu
[ Upstream commit 4a1728cad6340bfbe17bd17fd158b2165cd99508 ] Mark inode dirty explicitly in the end of recover_inode() to make sure that all recoverable fields can be persisted later. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20f2fs: fix to recover inode's project id during PORChao Yu
[ Upstream commit f4474aa6e5e901ee4af21f39f1b9115aaaaec503 ] Testcase to reproduce this bug: 1. mkfs.f2fs -O extra_attr -O project_quota /dev/sdd 2. mount -t f2fs /dev/sdd /mnt/f2fs 3. touch /mnt/f2fs/file 4. sync 5. chattr -p 1 /mnt/f2fs/file 6. xfs_io -f /mnt/f2fs/file -c "fsync" 7. godown /mnt/f2fs 8. umount /mnt/f2fs 9. mount -t f2fs /dev/sdd /mnt/f2fs 10. lsattr -p /mnt/f2fs/file 0 -----------------N- /mnt/f2fs/file But actually, we expect the correct result is: 1 -----------------N- /mnt/f2fs/file The reason is we didn't recover inode.i_projid field during mount, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20f2fs: fix to recover inode's uid/gid during PORChao Yu
[ Upstream commit dc4cd1257c86451cec3e8e352cc376348e4f4af4 ] Step to reproduce this bug: 1. logon as root 2. mount -t f2fs /dev/sdd /mnt; 3. touch /mnt/file; 4. chown system /mnt/file; chgrp system /mnt/file; 5. xfs_io -f /mnt/file -c "fsync"; 6. godown /mnt; 7. umount /mnt; 8. mount -t f2fs /dev/sdd /mnt; After step 8) we will expect file's uid/gid are all system, but during recovery, these two fields were not been recovered, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20kernfs: Fix range checks in kernfs_get_target_pathBernd Edlinger
[ Upstream commit a75e78f21f9ad4b810868c89dbbabcc3931591ca ] The terminating NUL byte is only there because the buffer is allocated with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the range-check is off-by-one, and PAGE_SIZE==PATH_MAX, the returned string may not be zero-terminated if it is exactly PATH_MAX characters long. Furthermore also the initial loop may theoretically exceed PATH_MAX and cause a fault. Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20media: dvb: fix compat ioctl translationArnd Bergmann
[ Upstream commit 1ccbeeb888ac33627d91f1ccf0b84ef3bcadef24 ] The VIDEO_GET_EVENT and VIDEO_STILLPICTURE was added back in 2005 but it never worked because the command number is wrong. Using the right command number means we have a better chance of them actually doing the right thing, though clearly nobody has ever tried it successfully. I noticed these while auditing the remaining users of compat_time_t for y2038 bugs. This one is fine in that regard, it just never did anything. Fixes: 6e87abd0b8cb ("[DVB]: Add compat ioctl handling.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20f2fs: fix memory leak of percpu counter in fill_super()Chao Yu
[ Upstream commit 4a70e255449c9a13eed7a6eeecc85a1ea63cef76 ] In fill_super -> init_percpu_info, we should destroy percpu counter in error path, otherwise memory allcoated for percpu counter will leak. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updatedBob Peterson
[ Upstream commit 4f36cb36c9d14340bb200d2ad9117b03ce992cfe ] The GFS2_RDF_UPTODATE flag in the rgrp is used to determine when a rgrp buffer is valid. It's cleared when the glock is invalidated, signifying that the buffer data is now invalid. But before this patch, function update_rgrp_lvb was setting the flag when it determined it had a valid lvb. But that's an invalid assumption: just because you have a valid lvb doesn't mean you have valid buffers. After all, another node may have made the lvb valid, and this node just fetched it from the glock via dlm. Consider this scenario: 1. The file system is mounted with RGRPLVB option. 2. In gfs2_inplace_reserve it locks the rgrp glock EX, but thanks to GL_SKIP, it skips the gfs2_rgrp_bh_get. 3. Since loops == 0 and the allocation target (ap->target) is bigger than the largest known chunk of blocks in the rgrp (rs->rs_rbm.rgd->rd_extfail_pt) it skips that rgrp and bypasses the call to gfs2_rgrp_bh_get there as well. 4. update_rgrp_lvb sees the lvb MAGIC number is valid, so bypasses gfs2_rgrp_bh_get, but it still sets sets GFS2_RDF_UPTODATE due to this invalid assumption. 5. The next time update_rgrp_lvb is called, it sees the bit is set and just returns 0, assuming both the lvb and rgrp are both uptodate. But since this is a smaller allocation, or space has been freed by another node, thus adjusting the lvb values, it decides to use the rgrp for allocations, with invalid rd_free due to the fact it was never updated. This patch changes update_rgrp_lvb so it doesn't set the UPTODATE flag anymore. That way, it has no choice but to fetch the latest values. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable eitherAl Viro
commit 762c69685ff7ad5ad7fee0656671e20a0c9c864d upstream. We need to get the underlying dentry of parent; sure, absent the races it is the parent of underlying dentry, but there's nothing to prevent losing a timeslice to preemtion in the middle of evaluation of lower_dentry->d_parent->d_inode, having another process move lower_dentry around and have its (ex)parent not pinned anymore and freed on memory pressure. Then we regain CPU and try to fetch ->d_inode from memory that is freed by that point. dentry->d_parent *is* stable here - it's an argument of ->lookup() and we are guaranteed that it won't be moved anywhere until we feed it to d_add/d_splice_alias. So we safely go that way to get to its underlying dentry. Cc: stable@vger.kernel.org # since 2009 or so Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-20ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stableAl Viro
commit e72b9dd6a5f17d0fb51f16f8685f3004361e83d0 upstream. lower_dentry can't go from positive to negative (we have it pinned), but it *can* go from negative to positive. So fetching ->d_inode into a local variable, doing a blocking allocation, checking that now ->d_inode is non-NULL and feeding the value we'd fetched earlier to a function that won't accept NULL is not a good idea. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is deadTejun Heo
commit 65de03e251382306a4575b1779c57c87889eee49 upstream. cgroup writeback tries to refresh the associated wb immediately if the current wb is dead. This is to avoid keeping issuing IOs on the stale wb after memcg - blkcg association has changed (ie. when blkcg got disabled / enabled higher up in the hierarchy). Unfortunately, the logic gets triggered spuriously on inodes which are associated with dead cgroups. When the logic is triggered on dead cgroups, the attempt fails only after doing quite a bit of work allocating and initializing a new wb. While c3aab9a0bd91 ("mm/filemap.c: don't initiate writeback if mapping has no dirty pages") alleviated the issue significantly as it now only triggers when the inode has dirty pages. However, the condition can still be triggered before the inode is switched to a different cgroup and the logic simply doesn't make sense. Skip the immediate switching if the associated memcg is dying. This is a simplified version of the following two patches: * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/ * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: e8a7abf5a5bd ("writeback: disassociate inodes from dying bdi_writebacks") Acked-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12NFSv4: Don't allow a cached open with a revoked delegationTrond Myklebust
[ Upstream commit be3df3dd4c70ee020587a943a31b98a0fb4b6424 ] If the delegation is marked as being revoked, we must not use it for cached opens. Fixes: 869f9dfa4d6d ("NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12configfs: fix a deadlock in configfs_symlink()Al Viro
commit 351e5d869e5ac10cb40c78b5f2d7dfc816ad4587 upstream. Configfs abuses symlink(2). Unlike the normal filesystems, it wants the target resolved at symlink(2) time, like link(2) would've done. The problem is that ->symlink() is called with the parent directory locked exclusive, so resolving the target inside the ->symlink() is easily deadlocked. Short of really ugly games in sys_symlink() itself, all we can do is to unlock the parent before resolving the target and relock it after. However, that invalidates the checks done by the caller of ->symlink(), so we have to * check that dentry is still where it used to be (it couldn't have been moved, but it could've been unhashed) * recheck that it's still negative (somebody else might've successfully created a symlink with the same name while we were looking the target up) * recheck the permissions on the parent directory. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12configfs: provide exclusion between IO and removalsAl Viro
commit b0841eefd9693827afb9888235e26ddd098f9cef upstream. Make sure that attribute methods are not called after the item has been removed from the tree. To do so, we * at the point of no return in removals, grab ->frag_sem exclusive and mark the fragment dead. * call the methods of attributes with ->frag_sem taken shared and only after having verified that the fragment is still alive. The main benefit is for method instances - they are guaranteed that the objects they are accessing *and* all ancestors are still there. Another win is that we don't need to bother with extra refcount on config_item when opening a file - the item will be alive for as long as it stays in the tree, and we won't touch it/attributes/any associated data after it's been removed from the tree. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12configfs: new object reprsenting tree fragmentsAl Viro
commit 47320fbe11a6059ae502c9c16b668022fdb4cf76 upstream. Refcounted, hangs of configfs_dirent, created by operations that add fragments to configfs tree (mkdir and configfs_register_{subsystem,group}). Will be used in the next commit to provide exclusion between fragment removal and ->show/->store calls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12configfs_register_group() shouldn't be (and isn't) called in rmdirable partsAl Viro
commit f19e4ed1e1edbfa3c9ccb9fed17759b7d6db24c6 upstream. revert cc57c07343bd "configfs: fix registered group removal" It was an attempt to handle something that fundamentally doesn't work - configfs_register_group() should never be done in a part of tree that can be rmdir'ed. And in mainline it never had been, so let's not borrow trouble; the fix was racy anyway, it would take a lot more to make that work and desired semantics is not clear. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12configfs: stash the data we need into configfs_buffer at open timeAl Viro
commit ff4dd081977da56566a848f071aed8fa92d604a1 upstream. simplifies the ->read()/->write()/->release() instances nicely Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12configfs: Fix bool initialization/comparisonThomas Meyer
commit 3f6928c347707a65cee10a9f54b85ad5fb078b3f upstream. Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12ceph: add missing check in d_revalidate snapdir handlingAl Viro
commit 1f08529c84cfecaf1261ed9b7e17fab18541c58f upstream. We should not play with dcache without parent locked... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12ceph: fix use-after-free in __ceph_remove_cap()Luis Henriques
commit ea60ed6fcf29eebc78f2ce91491e6309ee005a01 upstream. KASAN reports a use-after-free when running xfstest generic/531, with the following trace: [ 293.903362] kasan_report+0xe/0x20 [ 293.903365] rb_erase+0x1f/0x790 [ 293.903370] __ceph_remove_cap+0x201/0x370 [ 293.903375] __ceph_remove_caps+0x4b/0x70 [ 293.903380] ceph_evict_inode+0x4e/0x360 [ 293.903386] evict+0x169/0x290 [ 293.903390] __dentry_kill+0x16f/0x250 [ 293.903394] dput+0x1c6/0x440 [ 293.903398] __fput+0x184/0x330 [ 293.903404] task_work_run+0xb9/0xe0 [ 293.903410] exit_to_usermode_loop+0xd3/0xe0 [ 293.903413] do_syscall_64+0x1a0/0x1c0 [ 293.903417] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This happens because __ceph_remove_cap() may queue a cap release (__ceph_queue_cap_release) which can be scheduled before that cap is removed from the inode list with rb_erase(&cap->ci_node, &ci->i_caps); And, when this finally happens, the use-after-free will occur. This can be fixed by removing the cap from the inode list before being removed from the session list, and thus eliminating the risk of an UAF. Cc: stable@vger.kernel.org Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occursDave Wysochanski
[ Upstream commit d46b0da7a33dd8c99d969834f682267a45444ab3 ] There's a deadlock that is possible and can easily be seen with a test where multiple readers open/read/close of the same file and a disruption occurs causing reconnect. The deadlock is due a reader thread inside cifs_strict_readv calling down_read and obtaining lock_sem, and then after reconnect inside cifs_reopen_file calling down_read a second time. If in between the two down_read calls, a down_write comes from another process, deadlock occurs. CPU0 CPU1 ---- ---- cifs_strict_readv() down_read(&cifsi->lock_sem); _cifsFileInfo_put OR cifs_new_fileinfo down_write(&cifsi->lock_sem); cifs_reopen_file() down_read(&cifsi->lock_sem); Fix the above by changing all down_write(lock_sem) calls to down_write_trylock(lock_sem)/msleep() loop, which in turn makes the second down_read call benign since it will never block behind the writer while holding lock_sem. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Suggested-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed--by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06xfs: Correctly invert xfs_buftarg LRU isolation logicVratislav Bendel
commit 19957a181608d25c8f4136652d0ea00b3738972d upstream. Due to an inverted logic mistake in xfs_buftarg_isolate() the xfs_buffers with zero b_lru_ref will take another trip around LRU, while isolating buffers with non-zero b_lru_ref. Additionally those isolated buffers end up right back on the LRU once they are released, because b_lru_ref remains elevated. Fix that circuitous route by leaving them on the LRU as originally intended. Signed-off-by: Vratislav Bendel <vbendel@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Alex Lyakas <alex@zadara.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-06fuse: truncate pending writes on O_TRUNCMiklos Szeredi
commit e4648309b85a78f8c787457832269a8712a8673e upstream. Make sure cached writes are not reordered around open(..., O_TRUNC), with the obvious wrong results. Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on") Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-06fuse: flush dirty data/metadata before non-truncate setattrMiklos Szeredi
commit b24e7598db62386a95a3c8b9c75630c5d56fe077 upstream. If writeback cache is enabled, then writes might get reordered with chmod/chown/utimes. The problem with this is that performing the write in the fuse daemon might itself change some of these attributes. In such case the following sequence of operations will result in file ending up with the wrong mode, for example: int fd = open ("suid", O_WRONLY|O_CREAT|O_EXCL); write (fd, "1", 1); fchown (fd, 0, 0); fchmod (fd, 04755); close (fd); This patch fixes this by flushing pending writes before performing chown/chmod/utimes. Reported-by: Giuseppe Scrivano <gscrivan@redhat.com> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on") Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-06NFSv4: Fix leak of clp->cl_acceptor stringChuck Lever
[ Upstream commit 1047ec868332034d1fbcb2fae19fe6d4cb869ff2 ] Our client can issue multiple SETCLIENTID operations to the same server in some circumstances. Ensure that calls to nfs4_proc_setclientid() after the first one do not overwrite the previously allocated cl_acceptor string. unreferenced object 0xffff888461031800 (size 32): comm "mount.nfs", pid 2227, jiffies 4294822467 (age 1407.749s) hex dump (first 32 bytes): 6e 66 73 40 6b 6c 69 6d 74 2e 69 62 2e 31 30 31 nfs@klimt.ib.101 35 67 72 61 6e 67 65 72 2e 6e 65 74 00 00 00 00 5granger.net.... backtrace: [<00000000ab820188>] __kmalloc+0x128/0x176 [<00000000eeaf4ec8>] gss_stringify_acceptor+0xbd/0x1a7 [auth_rpcgss] [<00000000e85e3382>] nfs4_proc_setclientid+0x34e/0x46c [nfsv4] [<000000003d9cf1fa>] nfs40_discover_server_trunking+0x7a/0xed [nfsv4] [<00000000b81c3787>] nfs4_discover_server_trunking+0x81/0x244 [nfsv4] [<000000000801b55f>] nfs4_init_client+0x1b0/0x238 [nfsv4] [<00000000977daf7f>] nfs4_set_client+0xfe/0x14d [nfsv4] [<0000000053a68a2a>] nfs4_create_server+0x107/0x1db [nfsv4] [<0000000088262019>] nfs4_remote_mount+0x2c/0x59 [nfsv4] [<00000000e84a2fd0>] legacy_get_tree+0x2d/0x4c [<00000000797e947c>] vfs_get_tree+0x20/0xc7 [<00000000ecabaaa8>] fc_mount+0xe/0x36 [<00000000f15fafc2>] vfs_kern_mount+0x74/0x8d [<00000000a3ff4e26>] nfs_do_root_mount+0x8a/0xa3 [nfsv4] [<00000000d1c2b337>] nfs4_try_mount+0x58/0xad [nfsv4] [<000000004c9bddee>] nfs_fs_mount+0x820/0x869 [nfs] Fixes: f11b2a1cfbf5 ("nfs4: copy acceptor name from context ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06fs: ocfs2: fix a possible null-pointer dereference in ↵Jia-Ju Bai
ocfs2_info_scan_inode_alloc() [ Upstream commit 2abb7d3b12d007c30193f48bebed781009bebdd2 ] In ocfs2_info_scan_inode_alloc(), there is an if statement on line 283 to check whether inode_alloc is NULL: if (inode_alloc) When inode_alloc is NULL, it is used on line 287: ocfs2_inode_lock(inode_alloc, &bh, 0); ocfs2_inode_lock_full_nested(inode, ...) struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); Thus, a possible null-pointer dereference may occur. To fix this bug, inode_alloc is checked on line 286. This bug is found by a static analysis tool STCheck written by us. Link: http://lkml.kernel.org/r/20190726033717.32359-1-baijiaju1990@gmail.com Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()Jia-Ju Bai
[ Upstream commit 583fee3e12df0e6f1f66f063b989d8e7fed0e65a ] In ocfs2_write_end_nolock(), there are an if statement on lines 1976, 2047 and 2058, to check whether handle is NULL: if (handle) When handle is NULL, it is used on line 2045: ocfs2_update_inode_fsync_trans(handle, inode, 1); oi->i_sync_tid = handle->h_transaction->t_tid; Thus, a possible null-pointer dereference may occur. To fix this bug, handle is checked before calling ocfs2_update_inode_fsync_trans(). This bug is found by a static analysis tool STCheck written by us. Link: http://lkml.kernel.org/r/20190726033705.32307-1-baijiaju1990@gmail.com Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()Jia-Ju Bai
[ Upstream commit 56e94ea132bb5c2c1d0b60a6aeb34dcb7d71a53d ] In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to check whether loc->xl_entry is NULL: if (loc->xl_entry) When loc->xl_entry is NULL, it is used on line 2158: ocfs2_xa_add_entry(loc, name_hash); loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash); loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size); and line 2164: ocfs2_xa_add_namevalue(loc, xi); loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len); loc->xl_entry->xe_name_len = xi->xi_name_len; Thus, possible null-pointer dereferences may occur. To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry() abnormally returns with -EINVAL. These bugs are found by a static analysis tool STCheck written by us. [akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()] Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.com Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06ocfs2: clear zero in unaligned direct IOJia Guo
[ Upstream commit 7a243c82ea527cd1da47381ad9cd646844f3b693 ] Unused portion of a part-written fs-block-sized block is not set to zero in unaligned append direct write.This can lead to serious data inconsistencies. Ocfs2 manage disk with cluster size(for example, 1M), part-written in one cluster will change the cluster state from UN-WRITTEN to WRITTEN, VFS(function dio_zero_block) doesn't do the cleaning because bh's state is not set to NEW in function ocfs2_dio_wr_get_block when we write a WRITTEN cluster. For example, the cluster size is 1M, file size is 8k and we direct write from 14k to 15k, then 12k~14k and 15k~16k will contain dirty data. We have to deal with two cases: 1.The starting position of direct write is outside the file. 2.The starting position of direct write is located in the file. We need set bh's state to NEW in the first case. In the second case, we need mapped twice because bh's state of area out file should be set to NEW while area in file not. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.com Signed-off-by: Jia Guo <guojia12@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06fs: cifs: mute -Wunused-const-variable messageAustin Kim
[ Upstream commit dd19c106a36690b47bb1acc68372f2b472b495b8 ] After 'Initial git repository build' commit, 'mapping_table_ERRHRD' variable has not been used. So 'mapping_table_ERRHRD' const variable could be removed to mute below warning message: fs/cifs/netmisc.c:120:40: warning: unused variable 'mapping_table_ERRHRD' [-Wunused-const-variable] static const struct smb_to_posix_error mapping_table_ERRHRD[] = { ^ Signed-off-by: Austin Kim <austindh.kim@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06nfs: Fix nfsi->nrequests count error on nfs_inode_remove_requestZhangXiaoxu
[ Upstream commit 33ea5aaa87cdae0f9af4d6b7ee4f650a1a36fd1d ] When xfstests testing, there are some WARNING as below: WARNING: CPU: 0 PID: 6235 at fs/nfs/inode.c:122 nfs_clear_inode+0x9c/0xd8 Modules linked in: CPU: 0 PID: 6235 Comm: umount.nfs Hardware name: linux,dummy-virt (DT) pstate: 60000005 (nZCv daif -PAN -UAO) pc : nfs_clear_inode+0x9c/0xd8 lr : nfs_evict_inode+0x60/0x78 sp : fffffc000f68fc00 x29: fffffc000f68fc00 x28: fffffe00c53155c0 x27: fffffe00c5315000 x26: fffffc0009a63748 x25: fffffc000f68fd18 x24: fffffc000bfaaf40 x23: fffffc000936d3c0 x22: fffffe00c4ff5e20 x21: fffffc000bfaaf40 x20: fffffe00c4ff5d10 x19: fffffc000c056000 x18: 000000000000003c x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000040 x14: 0000000000000228 x13: fffffc000c3a2000 x12: 0000000000000045 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : fffffc00084b027c x5 : fffffc0009a64000 x4 : fffffe00c0e77400 x3 : fffffc000c0563a8 x2 : fffffffffffffffb x1 : 000000000000764e x0 : 0000000000000001 Call trace: nfs_clear_inode+0x9c/0xd8 nfs_evict_inode+0x60/0x78 evict+0x108/0x380 dispose_list+0x70/0xa0 evict_inodes+0x194/0x210 generic_shutdown_super+0xb0/0x220 nfs_kill_super+0x40/0x88 deactivate_locked_super+0xb4/0x120 deactivate_super+0x144/0x160 cleanup_mnt+0x98/0x148 __cleanup_mnt+0x38/0x50 task_work_run+0x114/0x160 do_notify_resume+0x2f8/0x308 work_pending+0x8/0x14 The nrequest should be increased/decreased only if PG_INODE_REF flag was setted. But in the nfs_inode_remove_request function, it maybe decrease when no PG_INODE_REF flag, this maybe lead nrequests count error. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06exec: load_script: Do not exec truncated interpreter pathKees Cook
[ Upstream commit b5372fe5dc84235dbe04998efdede3c4daa866a9 ] Commit 8099b047ecc4 ("exec: load_script: don't blindly truncate shebang string") was trying to protect against a confused exec of a truncated interpreter path. However, it was overeager and also refused to truncate arguments as well, which broke userspace, and it was reverted. This attempts the protection again, but allows arguments to remain truncated. In an effort to improve readability, helper functions and comments have been added. Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Samuel Dionne-Riel <samuel@dionne-riel.com> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Graham Christensen <graham@grahamc.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06f2fs: flush quota blocks after turnning it offJaegeuk Kim
[ Upstream commit 0e0667b625cf64243df83171bff61f9d350b9ca5 ] After quota_off, we'll get some dirty blocks. If put_super don't have a chance to flush them by checkpoint, it causes NULL pointer exception in end_io after iput(node_inode). (e.g., by checkpoint=disable) Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-29btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()Qu Wenruo
commit 4b654acdae850f48b8250b9a578a4eaa518c7a6f upstream. In btrfs_read_block_groups(), if we have an invalid block group which has mixed type (DATA|METADATA) while the fs doesn't have MIXED_GROUPS feature, we error out without freeing the block group cache. This patch will add the missing btrfs_put_block_group() to prevent memory leak. Note for stable backports: the file to patch in versions <= 5.3 is fs/btrfs/extent-tree.c Fixes: 49303381f19a ("Btrfs: bail out if block group has different mixed flag") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29CIFS: avoid using MID 0xFFFFRoberto Bergantinos Corpas
commit 03d9a9fe3f3aec508e485dd3dcfa1e99933b4bdb upstream. According to MS-CIFS specification MID 0xFFFF should not be used by the CIFS client, but we actually do. Besides, this has proven to cause races leading to oops between SendReceive2/cifs_demultiplex_thread. On SMB1, MID is a 2 byte value easy to reach in CurrentMid which may conflict with an oplock break notification request coming from server Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.cDavid Hildenbrand
commit aad5f69bc161af489dbb5934868bd347282f0764 upstream. There are three places where we access uninitialized memmaps, namely: - /proc/kpagecount - /proc/kpageflags - /proc/kpagecgroup We have initialized memmaps either when the section is online or when the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain garbage and in the worst case trigger kernel BUGs, especially with CONFIG_PAGE_POISONING. For example, not onlining a DIMM during boot and calling /proc/kpagecount with CONFIG_PAGE_POISONING: :/# cat /proc/kpagecount > tmp.test BUG: unable to handle page fault for address: fffffffffffffffe #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 114616067 P4D 114616067 PUD 114618067 PMD 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 RIP: 0010:kpagecount_read+0xce/0x1e0 Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480 RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202 RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000 RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08 FS: 00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0 Call Trace: proc_reg_read+0x3c/0x60 vfs_read+0xc5/0x180 ksys_read+0x68/0xe0 do_syscall_64+0x5c/0xa0 entry_SYSCALL_64_after_hwframe+0x49/0xbe For now, let's drop support for ZONE_DEVICE from the three pseudo files in order to fix this. To distinguish offline memory (with garbage memmap) from ZONE_DEVICE memory with properly initialized memmaps, we would have to check get_dev_pagemap() and pfn_zone_device_reserved() right now. The usage of both (especially, special casing devmem) is frowned upon and needs to be reworked. The fundamental issue we have is: if (pfn_to_online_page(pfn)) { /* memmap initialized */ } else if (pfn_valid(pfn)) { /* * ??? * a) offline memory. memmap garbage. * b) devmem: memmap initialized to ZONE_DEVICE. * c) devmem: reserved for driver. memmap garbage. * (d) devmem: memmap currently initializing - garbage) */ } We'll leave the pfn_zone_device_reserved() check in stable_page_flags() in place as that function is also used from memory failure. We now no longer dump information about pages that are not in use anymore - offline. Link: http://lkml.kernel.org/r/20191009142435.3975-2-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: <stable@vger.kernel.org> [4.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29ocfs2: fix panic due to ocfs2_wq is nullYi Li
commit b918c43021baaa3648de09e19a4a3dd555a45f40 upstream. mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters an error. ocfs2_initialize_super() returns before allocating ocfs2_wq. ocfs2_dismount_volume() triggers the following panic. Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted. Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30 Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30 Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30 Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30 Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30 ------------[ cut here ]------------ Oops: 0002 [#1] SMP NOPTI CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G E 4.14.148-200.ckv.x86_64 #1 Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018 task: ffff967af0520000 task.stack: ffffa5f05484000 RIP: 0010:mutex_lock+0x19/0x20 Call Trace: flush_workqueue+0x81/0x460 ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2] ocfs2_dismount_volume+0x84/0x400 [ocfs2] ocfs2_fill_super+0xa4/0x1270 [ocfs2] ? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2] mount_bdev+0x17f/0x1c0 mount_fs+0x3a/0x160 Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.com Signed-off-by: Yi Li <yilikernel@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17xfs: clear sb->s_fs_info on mount failureDave Chinner
commit c9fbd7bbc23dbdd73364be4d045e5d3612cf6e82 upstream. We recently had an oops reported on a 4.14 kernel in xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage and so the m_perag_tree lookup walked into lala land. Essentially, the machine was under memory pressure when the mount was being run, xfs_fs_fill_super() failed after allocating the xfs_mount and attaching it to sb->s_fs_info. It then cleaned up and freed the xfs_mount, but the sb->s_fs_info field still pointed to the freed memory. Hence when the superblock shrinker then ran it fell off the bad pointer. With the superblock shrinker problem fixed at teh VFS level, this stale s_fs_info pointer is still a problem - we use it unconditionally in ->put_super when the superblock is being torn down, and hence we can still trip over it after a ->fill_super call failure. Hence we need to clear s_fs_info if xfs-fs_fill_super() fails, and we need to check if it's valid in the places it can potentially be dereferenced after a ->fill_super failure. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Ajay Kaher <akaher@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17Fix the locking in dcache_readdir() and friendsAl Viro
commit d4f4de5e5ef8efde85febb6876cd3c8ab1631999 upstream. There are two problems in dcache_readdir() - one is that lockless traversal of the list needs non-trivial cooperation of d_alloc() (at least a switch to list_add_rcu(), and probably more than just that) and another is that it assumes that no removal will happen without the directory locked exclusive. Said assumption had always been there, never had been stated explicitly and is violated by several places in the kernel (devpts and selinuxfs). * replacement of next_positive() with different calling conventions: it returns struct list_head * instead of struct dentry *; the latter is passed in and out by reference, grabbing the result and dropping the original value. * scan is under ->d_lock. If we run out of timeslice, cursor is moved after the last position we'd reached and we reschedule; then the scan continues from that place. To avoid livelocks between multiple lseek() (with cursors getting moved past each other, never reaching the real entries) we always skip the cursors, need_resched() or not. * returned list_head * is either ->d_child of dentry we'd found or ->d_subdirs of parent (if we got to the end of the list). * dcache_readdir() and dcache_dir_lseek() switched to new helper. dcache_readdir() always holds a reference to dentry passed to dir_emit() now. Cursor is moved to just before the entry where dir_emit() has failed or into the very end of the list, if we'd run out. * move_cursor() eliminated - it had sucky calling conventions and after fixing that it became simply list_move() (in lseek and scan_positives) or list_move_tail() (in readdir). All operations with the list are under ->d_lock now, and we do not depend upon having all file removals done with parent locked exclusive anymore. Cc: stable@vger.kernel.org Reported-by: "zhengbin (A)" <zhengbin13@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17NFS: Fix O_DIRECT accounting of number of bytes read/writtenTrond Myklebust
commit 031d73ed768a40684f3ca21992265ffdb6a270bf upstream. When a series of O_DIRECT reads or writes are truncated, either due to eof or due to an error, then we should return the number of contiguous bytes that were received/sent starting at the offset specified by the application. Currently, we are failing to correctly check contiguity, and so we're failing the generic/465 in xfstests when the race between the read and write RPCs causes the file to get extended while the 2 reads are outstanding. If the first read RPC call wins the race and returns with eof set, we should treat the second read RPC as being truncated. Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> Fixes: 1ccbad9f9f9bd ("nfs: fix DIO good bytes calculation") Cc: stable@vger.kernel.org # 4.1+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17btrfs: fix incorrect updating of log root treeJosef Bacik
commit 4203e968947071586a98b5314fd7ffdea3b4f971 upstream. We've historically had reports of being unable to mount file systems because the tree log root couldn't be read. Usually this is the "parent transid failure", but could be any of the related errors, including "fsid mismatch" or "bad tree block", depending on which block got allocated. The modification of the individual log root items are serialized on the per-log root root_mutex. This means that any modification to the per-subvol log root_item is completely protected. However we update the root item in the log root tree outside of the log root tree log_mutex. We do this in order to allow multiple subvolumes to be updated in each log transaction. This is problematic however because when we are writing the log root tree out we update the super block with the _current_ log root node information. Since these two operations happen independently of each other, you can end up updating the log root tree in between writing out the dirty blocks and setting the super block to point at the current root. This means we'll point at the new root node that hasn't been written out, instead of the one we should be pointing at. Thus whatever garbage or old block we end up pointing at complains when we mount the file system later and try to replay the log. Fix this by copying the log's root item into a local root item copy. Then once we're safely under the log_root_tree->log_mutex we update the root item in the log_root_tree. This way we do not modify the log_root_tree while we're committing it, fixing the problem. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Chris Mason <clm@fb.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17CIFS: Force reval dentry if LOOKUP_REVAL flag is setPavel Shilovsky
commit 0b3d0ef9840f7be202393ca9116b857f6f793715 upstream. Mark inode for force revalidation if LOOKUP_REVAL flag is set. This tells the client to actually send a QueryInfo request to the server to obtain the latest metadata in case a directory or a file were changed remotely. Only do that if the client doesn't have a lease for the file to avoid unneeded round trips to the server. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17CIFS: Force revalidate inode when dentry is stalePavel Shilovsky
commit c82e5ac7fe3570a269c0929bf7899f62048e7dbc upstream. Currently the client indicates that a dentry is stale when inode numbers or type types between a local inode and a remote file don't match. If this is the case attributes is not being copied from remote to local, so, it is already known that the local copy has stale metadata. That's why the inode needs to be marked for revalidation in order to tell the VFS to lookup the dentry again before openning a file. This prevents unexpected stale errors to be returned to the user space when openning a file. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17CIFS: Gracefully handle QueryInfo errors during openPavel Shilovsky
commit 30573a82fb179420b8aac30a3a3595aa96a93156 upstream. Currently if the client identifies problems when processing metadata returned in CREATE response, the open handle is being leaked. This causes multiple problems like a file missing a lease break by that client which causes high latencies to other clients accessing the file. Another side-effect of this is that the file can't be deleted. Fix this by closing the file after the client hits an error after the file was opened and the open descriptor wasn't returned to the user space. Also convert -ESTALE to -EOPENSTALE to allow the VFS to revalidate a dentry and retry the open. Cc: <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17f2fs: use EINVAL for superblock with invalid magicIcenowy Zheng
[ Upstream commit 38fb6d0ea34299d97b031ed64fe994158b6f8eb3 ] The kernel mount_block_root() function expects -EACESS or -EINVAL for a unmountable filesystem when trying to mount the root with different filesystem types. However, in 5.3-rc1 the behavior when F2FS code cannot find valid block changed to return -EFSCORRUPTED(-EUCLEAN), and this error code makes mount_block_root() fail when trying to probe F2FS. When the magic number of the superblock mismatches, it has a high probability that it's just not a F2FS. In this case return -EINVAL seems to be a better result, and this return value can make mount_block_root() probing work again. Return -EINVAL when the superblock has magic mismatch, -EFSCORRUPTED in other cases (the magic matches but the superblock cannot be recognized). Fixes: 10f966bbf521 ("f2fs: use generic EFSBADCRC/EFSCORRUPTED") Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11vfs: Fix EOVERFLOW testing in put_compat_statfs64Eric Sandeen
commit cc3a7bfe62b947b423fcb2cfe89fcba92bf48fa3 upstream. Today, put_compat_statfs64() disallows nearly any field value over 2^32 if f_bsize is only 32 bits, but that makes no sense. compat_statfs64 is there for the explicit purpose of providing 64-bit fields for f_files, f_ffree, etc. And f_bsize is always only 32 bits. As a result, 32-bit userspace gets -EOVERFLOW for i.e. large file counts even with -D_FILE_OFFSET_BITS=64 set. In reality, only f_bsize and f_frsize can legitimately overflow (fields like f_type and f_namelen should never be large), so test only those fields. This bug was discussed at length some time ago, and this is the proposal Al suggested at https://lkml.org/lkml/2018/8/6/640. It seemed to get dropped amid the discussion of other related changes, but this part seems obviously correct on its own, so I've picked it up and sent it, for expediency. Fixes: 64d2ab32efe3 ("vfs: fix put_compat_statfs64() does not handle errors") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11fuse: fix memleak in cuse_channel_openzhengbin
[ Upstream commit 9ad09b1976c562061636ff1e01bfc3a57aebe56b ] If cuse_send_init fails, need to fuse_conn_put cc->fc. cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1) ->fuse_dev_alloc->fuse_conn_get ->fuse_dev_free->fuse_conn_put Fixes: cc080e9e9be1 ("fuse: introduce per-instance fuse_dev structure") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11pNFS: Ensure we do clear the return-on-close layout stateid on fatal errorsTrond Myklebust
[ Upstream commit 9c47b18cf722184f32148784189fca945a7d0561 ] IF the server rejected our layout return with a state error such as NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want to clear out all the remaining layout segments and mark that stateid as invalid. Fixes: 1c5bd76d17cca ("pNFS: Enable layoutreturn operation for...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>