summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-05-26hfsplus: stop workqueue when fill_super() failedTetsuo Handa
commit 66072c29328717072fd84aaff3e070e3f008ba77 upstream. syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1]. This is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync(). As far as I can see, it is hfsplus_mark_mdb_dirty() from hfsplus_new_inode() in hfsplus_fill_super() that calls queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does not fail if queue_delayed_work() was called, and the out_put_hidden_dir label is the appropriate location to call cancel_delayed_work_sync(). [1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9 Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26ext2: fix a block leakAl Viro
commit 5aa1437d2d9a068c0334bd7c9dafa8ec4f97f13b upstream. open file, unlink it, then use ioctl(2) to make it immutable or append only. Now close it and watch the blocks *not* freed... Immutable/append-only checks belong in ->setattr(). Note: the bug is old and backport to anything prior to 737f2e93b972 ("ext2: convert to use the new truncate convention") will need these checks lifted into ext2_setattr(). Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26btrfs: fix reading stale metadata blocks after degraded raid1 mountsLiu Bo
commit 02a3307aa9c20b4f6626255b028f07f6cfa16feb upstream. If a btree block, aka. extent buffer, is not available in the extent buffer cache, it'll be read out from the disk instead, i.e. btrfs_search_slot() read_block_for_search() # hold parent and its lock, go to read child btrfs_release_path() read_tree_block() # read child Unfortunately, the parent lock got released before reading child, so commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had used 0 as parent transid to read the child block. It forces read_tree_block() not to check if parent transid is different with the generation id of the child that it reads out from disk. A simple PoC is included in btrfs/124, 0. A two-disk raid1 btrfs, 1. Right after mkfs.btrfs, block A is allocated to be device tree's root. 2. Mount this filesystem and put it in use, after a while, device tree's root got COW but block A hasn't been allocated/overwritten yet. 3. Umount it and reload the btrfs module to remove both disks from the global @fs_devices list. 4. mount -odegraded dev1 and write some data, so now block A is allocated to be a leaf in checksum tree. Note that only dev1 has the latest metadata of this filesystem. 5. Umount it and mount it again normally (with both disks), since raid1 can pick up one disk by the writer task's pid, if btrfs_search_slot() needs to read block A, dev2 which does NOT have the latest metadata might be read for block A, then we got a stale block A. 6. As parent transid is not checked, block A is marked as uptodate and put into the extent buffer cache, so the future search won't bother to read disk again, which means it'll make changes on this stale one and make it dirty and flush it onto disk. To avoid the problem, parent transid needs to be passed to read_tree_block(). In order to get a valid parent transid, we need to hold the parent's lock until finishing reading child. This patch needs to be slightly adapted for stable kernels, the &first_key parameter added to read_tree_block() is from 4.16+ (581c1760415c4). The fix is to replace 0 by 'gen'. Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26btrfs: fix crash when trying to resume balance without the resume flagAnand Jain
commit 02ee654d3a04563c67bfe658a05384548b9bb105 upstream. We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance() only, which isn't called during the remount. So when resuming from the paused balance we hit the bug: kernel: kernel BUG at fs/btrfs/volumes.c:3890! :: kernel: balance_kthread+0x51/0x60 [btrfs] kernel: kthread+0x111/0x130 :: kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8 Reproducer: On a mounted filesystem: btrfs balance start --full-balance /btrfs btrfs balance pause /btrfs mount -o remount,ro /dev/sdb /btrfs mount -o remount,rw /dev/sdb /btrfs To fix this set the BTRFS_BALANCE_RESUME flag in btrfs_resume_balance_async(). CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26Btrfs: fix xattr loss after power failureFilipe Manana
commit 9a8fca62aacc1599fea8e813d01e1955513e4fad upstream. If a file has xattrs, we fsync it, to ensure we clear the flags BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its inode, the current transaction commits and then we fsync it (without either of those bits being set in its inode), we end up not logging all its xattrs. This results in deleting all xattrs when replying the log after a power failure. Trivial reproducer $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ touch /mnt/foobar $ setfattr -n user.xa -v qwerty /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar $ sync $ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar <power failure> $ mount /dev/sdb /mnt $ getfattr --absolute-names --dump /mnt/foobar <empty output> $ So fix this by making sure all xattrs are logged if we log a file's inode item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor BTRFS_INODE_COPY_EVERYTHING were set in the inode. Fixes: 36283bf777d9 ("Btrfs: fix fsync xattr loss in the fast fsync path") Cc: <stable@vger.kernel.org> # 4.2+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26procfs: fix pthread cross-thread naming if !PR_DUMPABLEJanis Danisevskis
commit 1b3044e39a89cb1d4d5313da477e8dfea2b5232d upstream. The PR_DUMPABLE flag causes the pid related paths of the proc file system to be owned by ROOT. The implementation of pthread_set/getname_np however needs access to /proc/<pid>/task/<tid>/comm. If PR_DUMPABLE is false this implementation is locked out. This patch installs a special permission function for the file "comm" that grants read and write access to all threads of the same group regardless of the ownership of the inode. For all other threads the function falls back to the generic inode permission check. [akpm@linux-foundation.org: fix spello in comment] Signed-off-by: Janis Danisevskis <jdanis@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Rientjes <rientjes@google.com> Cc: Minfei Huang <mnfhuang@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Jann Horn <jann@thejh.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26proc read mm's {arg,env}_{start,end} with mmap semaphore taken.Mateusz Guzik
commit a3b609ef9f8b1dbfe97034ccad6cd3fe71fbe7ab upstream. Only functions doing more than one read are modified. Consumeres happened to deal with possibly changing data, but it does not seem like a good thing to rely on. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jarod Wilson <jarod@redhat.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anshuman Khandual <anshuman.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26proc: meminfo: estimate available memory more conservativelyJohannes Weiner
commit 84ad5802a33a4964a49b8f7d24d80a214a096b19 upstream. The MemAvailable item in /proc/meminfo is to give users a hint of how much memory is allocatable without causing swapping, so it excludes the zones' low watermarks as unavailable to userspace. However, for a userspace allocation, kswapd will actually reclaim until the free pages hit a combination of the high watermark and the page allocator's lowmem protection that keeps a certain amount of DMA and DMA32 memory from userspace as well. Subtract the full amount we know to be unavailable to userspace from the number of free pages when calculating MemAvailable. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26pipe: cap initial pipe capacity according to pipe-max-size limitMichael Kerrisk (man-pages)
commit 086e774a57fba4695f14383c0818994c0b31da7c upstream. This is a patch that provides behavior that is more consistent, and probably less surprising to users. I consider the change optional, and welcome opinions about whether it should be applied. By default, pipes are created with a capacity of 64 kiB. However, /proc/sys/fs/pipe-max-size may be set smaller than this value. In this scenario, an unprivileged user could thus create a pipe whose initial capacity exceeds the limit. Therefore, it seems logical to cap the initial pipe capacity according to the value of pipe-max-size. The test program shown earlier in this patch series can be used to demonstrate the effect of the change brought about with this patch: # cat /proc/sys/fs/pipe-max-size 1048576 # sudo -u mtk ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 65536 # echo 10000 > /proc/sys/fs/pipe-max-size # cat /proc/sys/fs/pipe-max-size 16384 # sudo -u mtk ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 16384 # ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 65536 The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size caps the initial allocation for a new pipe for unprivileged users, but not for privileged users. Link: http://lkml.kernel.org/r/31dc7064-2a17-9c5b-1df1-4e3012ee992c@gmail.com Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Willy Tarreau <w@1wt.eu> Cc: <socketpair@gmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Jens Axboe <axboe@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26lockd: lost rollback of set_grace_period() in lockd_down_net()Vasily Averin
commit 3a2b19d1ee5633f76ae8a88da7bc039a5d1732aa upstream. Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect, it removes lockd_manager and disarm grace_period_end for init_net only. If nfsd was started from another net namespace lockd_up_net() calls set_grace_period() that adds lockd_manager into per-netns list and queues grace_period_end delayed work. These action should be reverted in lockd_down_net(). Otherwise it can lead to double list_add on after restart nfsd in netns, and to use-after-free if non-disarmed delayed work will be executed after netns destroy. Fixes: efda760fe95e ("lockd: fix lockd shutdown race") Cc: stable@vger.kernel.org Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16f2fs: fix a dead loop in f2fs_fiemap()Wei Fang
commit b86e33075ed1909d8002745b56ecf73b833db143 upstream. A dead loop can be triggered in f2fs_fiemap() using the test case as below: ... fd = open(); fallocate(fd, 0, 0, 4294967296); ioctl(fd, FS_IOC_FIEMAP, fiemap_buf); ... It's caused by an overflow in __get_data_block(): ... bh->b_size = map.m_len << inode->i_blkbits; ... map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits on 64 bits archtecture, type conversion from an unsigned int to a size_t will result in an overflow. In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap() will call get_data_block() at block 0 again an again. Fix this by adding a force conversion before left shift. Signed-off-by: Wei Fang <fangwei1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16bdi: Fix oops in wb_workfn()Jan Kara
commit b8b784958eccbf8f51ebeee65282ca3fd59ea391 upstream. Syzbot has reported that it can hit a NULL pointer dereference in wb_workfn() due to wb->bdi->dev being NULL. This indicates that wb_workfn() was called for an already unregistered bdi which should not happen as wb_shutdown() called from bdi_unregister() should make sure all pending writeback works are completed before bdi is unregistered. Except that wb_workfn() itself can requeue the work with: mod_delayed_work(bdi_wq, &wb->dwork, 0); and if this happens while wb_shutdown() is waiting in: flush_delayed_work(&wb->dwork); the dwork can get executed after wb_shutdown() has finished and bdi_unregister() has cleared wb->bdi->dev. Make wb_workfn() use wakeup_wb() for requeueing the work which takes all the necessary precautions against racing with bdi unregistration. CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> CC: Tejun Heo <tj@kernel.org> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16xfs: prevent creating negative-sized file via INSERT_RANGEDarrick J. Wong
commit 7d83fb14258b9961920cd86f0b921caaeb3ebe85 upstream. During the "insert range" fallocate operation, i_size grows by the specified 'len' bytes. XFS verifies that i_size + len < s_maxbytes, as it should. But this comparison is done using the signed 'loff_t', and 'i_size + len' can wrap around to a negative value, causing the check to incorrectly pass, resulting in an inode with "negative" i_size. This is possible on 64-bit platforms, where XFS sets s_maxbytes = LLONG_MAX. ext4 and f2fs don't run into this because they set a smaller s_maxbytes. Fix it by using subtraction instead. Reproducer: xfs_io -f file -c "truncate $(((1<<63)-1))" -c "finsert 0 4096" Fixes: a904b1ca5751 ("xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: <stable@vger.kernel.org> # v4.1+ Originally-From: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix signed integer addition overflow too] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-02ext4: fix bitmap position validationLukas Czerner
commit 22be37acce25d66ecf6403fc8f44df9c5ded2372 upstream. Currently in ext4_valid_block_bitmap() we expect the bitmap to be positioned anywhere between 0 and s_blocksize clusters, but that's wrong because the bitmap can be placed anywhere in the block group. This causes false positives when validating bitmaps on perfectly valid file system layouts. Fix it by checking whether the bitmap is within the group boundary. The problem can be reproduced using the following mkfs -t ext3 -E stride=256 /dev/vdb1 mount /dev/vdb1 /mnt/test cd /mnt/test wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz tar xf linux-4.16.3.tar.xz This will result in the warnings in the logs EXT4-fs error (device vdb1): ext4_validate_block_bitmap:399: comm tar: bg 84: block 2774529: invalid block bitmap [ Changed slightly for clarity and to not drop a overflow test -- TYT ] Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Ilya Dryomov <idryomov@gmail.com> Fixes: 7dac4a1726a9 ("ext4: add validity checks for bitmap block numbers") Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-02ext4: add validity checks for bitmap block numbersTheodore Ts'o
commit 7dac4a1726a9c64a517d595c40e95e2d0d135f6f upstream. An privileged attacker can cause a crash by mounting a crafted ext4 image which triggers a out-of-bounds read in the function ext4_valid_block_bitmap() in fs/ext4/balloc.c. This issue has been assigned CVE-2018-1093. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199181 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1560782 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-02ext4: set h_journal if there is a failure starting a reserved handleTheodore Ts'o
commit b2569260d55228b617bd82aba6d0db2faeeb4116 upstream. If ext4 tries to start a reserved handle via jbd2_journal_start_reserved(), and the journal has been aborted, this can result in a NULL pointer dereference. This is because the fields h_journal and h_transaction in the handle structure share the same memory, via a union, so jbd2_journal_start_reserved() will clear h_journal before calling start_this_handle(). If this function fails due to an aborted handle, h_journal will still be NULL, and the call to jbd2_journal_free_reserved() will pass a NULL journal to sub_reserve_credits(). This can be reproduced by running "kvm-xfstests -c dioread_nolock generic/475". Cc: stable@kernel.org # 3.11 Fixes: 8f7d89f36829b ("jbd2: transaction reservation support") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-02ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKSEric Biggers
commit 349fa7d6e1935f49bf4161c4900711b2989180a9 upstream. During the "insert range" fallocate operation, extents starting at the range offset are shifted "right" (to a higher file offset) by the range length. But, as shown by syzbot, it's not validated that this doesn't cause extents to be shifted beyond EXT_MAX_BLOCKS. In that case ->ee_block can wrap around, corrupting the extent tree. Fix it by returning an error if the space between the end of the last extent and EXT4_MAX_BLOCKS is smaller than the range being inserted. This bug can be reproduced by running the following commands when the current directory is on an ext4 filesystem with a 4k block size: fallocate -l 8192 file fallocate --keep-size -o 0xfffffffe000 -l 4096 -n file fallocate --insert-range -l 8192 file Then after unmounting the filesystem, e2fsck reports corruption. Reported-by: syzbot+06c885be0edcdaeab40c@syzkaller.appspotmail.com Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-29jbd2: fix use after free in kjournald2()Sahitya Tummala
commit dbfcef6b0f4012c57bc0b6e0e660d5ed12a5eaed upstream. Below is the synchronization issue between unmount and kjournald2 contexts, which results into use after free issue in kjournald2(). Fix this issue by using journal->j_state_lock to synchronize the wait_event() done in journal_kill_thread() and the wake_up() done in kjournald2(). TASK 1: umount cmd: |--jbd2_journal_destroy() { |--journal_kill_thread() { write_lock(&journal->j_state_lock); journal->j_flags |= JBD2_UNMOUNT; ... write_unlock(&journal->j_state_lock); wake_up(&journal->j_wait_commit); TASK 2 wakes up here: kjournald2() { ... checks JBD2_UNMOUNT flag and calls goto end-loop; ... end_loop: write_unlock(&journal->j_state_lock); journal->j_task = NULL; --> If this thread gets pre-empted here, then TASK 1 wait_event will exit even before this thread is completely done. wait_event(journal->j_wait_done_commit, journal->j_task == NULL); ... write_lock(&journal->j_state_lock); write_unlock(&journal->j_state_lock); } |--kfree(journal); } } wake_up(&journal->j_wait_done_commit); --> this step now results into use after free issue. } Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-29cifs: do not allow creating sockets except with SMB1 posix exensionsSteve French
commit 1d0cffa674cfa7d185a302c8c6850fc50b893bed upstream. RHBZ: 1453123 Since at least the 3.10 kernel and likely a lot earlier we have not been able to create unix domain sockets in a cifs share when mounted using the SFU mount option (except when mounted with the cifs unix extensions to Samba e.g.) Trying to create a socket, for example using the af_unix command from xfstests will cause : BUG: unable to handle kernel NULL pointer dereference at 00000000 00000040 Since no one uses or depends on being able to create unix domains sockets on a cifs share the easiest fix to stop this vulnerability is to simply not allow creation of any other special files than char or block devices when sfu is used. Added update to Ronnie's patch to handle a tcon link leak, and to address a buf leak noticed by Gustavo and Colin. Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com> CC: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24writeback: safer lock nestingGreg Thelen
commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream. lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if the page's memcg is undergoing move accounting, which occurs when a process leaves its memcg for a new one that has memory.move_charge_at_immigrate set. unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the given inode is switching writeback domains. Switches occur when enough writes are issued from a new domain. This existing pattern is thus suspicious: lock_page_memcg(page); unlocked_inode_to_wb_begin(inode, &locked); ... unlocked_inode_to_wb_end(inode, locked); unlock_page_memcg(page); If both inode switch and process memcg migration are both in-flight then unlocked_inode_to_wb_end() will unconditionally enable interrupts while still holding the lock_page_memcg() irq spinlock. This suggests the possibility of deadlock if an interrupt occurs before unlock_page_memcg(). truncate __cancel_dirty_page lock_page_memcg unlocked_inode_to_wb_begin unlocked_inode_to_wb_end <interrupts mistakenly enabled> <interrupt> end_page_writeback test_clear_page_writeback lock_page_memcg <deadlock> unlock_page_memcg Due to configuration limitations this deadlock is not currently possible because we don't mix cgroup writeback (a cgroupv2 feature) and memory.move_charge_at_immigrate (a cgroupv1 feature). If the kernel is hacked to always claim inode switching and memcg moving_account, then this script triggers lockup in less than a minute: cd /mnt/cgroup/memory mkdir a b echo 1 > a/memory.move_charge_at_immigrate echo 1 > b/memory.move_charge_at_immigrate ( echo $BASHPID > a/cgroup.procs while true; do dd if=/dev/zero of=/mnt/big bs=1M count=256 done ) & while true; do sync done & sleep 1h & SLEEP=$! while true; do echo $SLEEP > a/cgroup.procs echo $SLEEP > b/cgroup.procs done The deadlock does not seem possible, so it's debatable if there's any reason to modify the kernel. I suggest we should to prevent future surprises. And Wang Long said "this deadlock occurs three times in our environment", so there's more reason to apply this, even to stable. Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch see "[PATCH for-4.4] writeback: safer lock nesting" https://lkml.org/lkml/2018/4/11/146 Wang Long said "this deadlock occurs three times in our environment" [gthelen@google.com: v4] Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com [akpm@linux-foundation.org: comment tweaks, struct initialization simplification] Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613 Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") Signed-off-by: Greg Thelen <gthelen@google.com> Reported-by: Wang Long <wanglong19@meituan.com> Acked-by: Wang Long <wanglong19@meituan.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> [v4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [natechancellor: Applied to 4.4 based on Greg's backport on lkml.org] Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24fanotify: fix logic of events on childAmir Goldstein
commit 54a307ba8d3cd00a3902337ffaae28f436eeb1a4 upstream. When event on child inodes are sent to the parent inode mark and parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event will not be delivered to the listener process. However, if the same process also has a mount mark, the event to the parent inode will be delivered regadless of the mount mark mask. This behavior is incorrect in the case where the mount mark mask does not contain the specific event type. For example, the process adds a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD) and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR). A modify event on a file inside that directory (and inside that mount) should not create a FAN_MODIFY event, because neither of the marks requested to get that event on the file. Fixes: 1968f5eed54c ("fanotify: use both marks when possible") Cc: stable <stable@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> [natechancellor: Fix small conflict due to lack of 3cd5eca8d7a2f] Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24ext4: bugfix for mmaped pages in mpage_release_unused_pages()wangguang
commit 4e800c0359d9a53e6bf0ab216954971b2515247f upstream. Pages clear buffers after ext4 delayed block allocation failed, However, it does not clean its pte_dirty flag. if the pages unmap ,in cording to the pte_dirty , unmap_page_range may try to call __set_page_dirty, which may lead to the bugon at mpage_prepare_extent_to_map:head = page_buffers(page);. This patch just call clear_page_dirty_for_io to clean pte_dirty at mpage_release_unused_pages for pages mmaped. Steps to reproduce the bug: (1) mmap a file in ext4 addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); memset(addr, 'i', 4096); (2) return EIO at ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent which causes this log message to be print: ext4_msg(sb, KERN_CRIT, "Delayed block allocation failed for " "inode %lu at logical offset %llu with" " max blocks %u with error %d", inode->i_ino, (unsigned long long)map->m_lblk, (unsigned)map->m_len, -err); (3)Unmap the addr cause warning at __set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page)); (4) wait for a minute,then bugon happen. Cc: stable@vger.kernel.org Signed-off-by: wangguang <wangguang03@zte.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> [@nathanchance: Resolved conflict from lack of 09cbfeaf1a5a6] Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24autofs: mount point create should honour passed in modeIan Kent
commit 1e6306652ba18723015d1b4967fe9de55f042499 upstream. The autofs file system mkdir inode operation blindly sets the created directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can cause selinux dac_override denials. But the function also checks if the caller is the daemon (as no-one else should be able to do anything here) so there's no point in not honouring the passed in mode, allowing the daemon to set appropriate mode when required. Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24Don't leak MNT_INTERNAL away from internal mountsAl Viro
commit 16a34adb9392b2fe4195267475ab5b472e55292c upstream. We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for their copies. As it is, creating a deep stack of bindings of /proc/*/ns/* somewhere in a new namespace and exiting yields a stack overflow. Cc: stable@kernel.org Reported-by: Alexander Aring <aring@mojatatu.com> Bisected-by: Kirill Tkhai <ktkhai@virtuozzo.com> Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Tested-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24jffs2_kill_sb(): deal with failed allocationsAl Viro
commit c66b23c2840446a82c389e4cb1a12eb2a71fa2e4 upstream. jffs2_fill_super() might fail to allocate jffs2_sb_info; jffs2_kill_sb() must survive that. Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()Theodore Ts'o
commit c755e251357a0cee0679081f08c3f4ba797a8009 upstream. The xattr_sem deadlock problems fixed in commit 2e81a4eeedca: "ext4: avoid deadlock when expanding inode size" didn't include the use of xattr_sem in fs/ext4/inline.c. With the addition of project quota which added a new extra inode field, this exposed deadlocks in the inline_data code similar to the ones fixed by 2e81a4eeedca. The deadlock can be reproduced via: dmesg -n 7 mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768 mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc mkdir /vdc/a umount /vdc mount -t ext4 /dev/vdc /vdc echo foo > /vdc/a/foo and looks like this: [ 11.158815] [ 11.160276] ============================================= [ 11.161960] [ INFO: possible recursive locking detected ] [ 11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G W [ 11.161960] --------------------------------------------- [ 11.161960] bash/2519 is trying to acquire lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] [ 11.161960] but task is already holding lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] other info that might help us debug this: [ 11.161960] Possible unsafe locking scenario: [ 11.161960] [ 11.161960] CPU0 [ 11.161960] ---- [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] [ 11.161960] *** DEADLOCK *** [ 11.161960] [ 11.161960] May be due to missing lock nesting notation [ 11.161960] [ 11.161960] 4 locks held by bash/2519: [ 11.161960] #0: (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e [ 11.161960] #1: (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a [ 11.161960] #2: (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622 [ 11.161960] #3: (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] stack backtrace: [ 11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G W 4.10.0-rc3-00015-g011b30a8a3cf #160 [ 11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014 [ 11.161960] Call Trace: [ 11.161960] dump_stack+0x72/0xa3 [ 11.161960] __lock_acquire+0xb7c/0xcb9 [ 11.161960] ? kvm_clock_read+0x1f/0x29 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] lock_acquire+0x106/0x18a [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] down_write+0x39/0x72 [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ? _raw_read_unlock+0x22/0x2c [ 11.161960] ? jbd2_journal_extend+0x1e2/0x262 [ 11.161960] ? __ext4_journal_get_write_access+0x3d/0x60 [ 11.161960] ext4_mark_inode_dirty+0x17d/0x26d [ 11.161960] ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_try_add_inline_entry+0x69/0x152 [ 11.161960] ext4_add_entry+0xa3/0x848 [ 11.161960] ? __brelse+0x14/0x2f [ 11.161960] ? _raw_spin_unlock_irqrestore+0x44/0x4f [ 11.161960] ext4_add_nondir+0x17/0x5b [ 11.161960] ext4_create+0xcf/0x133 [ 11.161960] ? ext4_mknod+0x12f/0x12f [ 11.161960] lookup_open+0x39e/0x3fb [ 11.161960] ? __wake_up+0x1a/0x40 [ 11.161960] ? lock_acquire+0x11e/0x18a [ 11.161960] path_openat+0x35c/0x67a [ 11.161960] ? sched_clock_cpu+0xd7/0xf2 [ 11.161960] do_filp_open+0x36/0x7c [ 11.161960] ? _raw_spin_unlock+0x22/0x2c [ 11.161960] ? __alloc_fd+0x169/0x173 [ 11.161960] do_sys_open+0x59/0xcc [ 11.161960] SyS_open+0x1d/0x1f [ 11.161960] do_int80_syscall_32+0x4f/0x61 [ 11.161960] entry_INT80_32+0x2f/0x2f [ 11.161960] EIP: 0xb76ad469 [ 11.161960] EFLAGS: 00000286 CPU: 0 [ 11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6 [ 11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0 [ 11.161960] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4eeedca as a prereq) Reported-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24ext4: fix crashes in dioread_nolock modeJan Kara
commit 74dae4278546b897eb81784fdfcce872ddd8b2b8 upstream. Competing overwrite DIO in dioread_nolock mode will just overwrite pointer to io_end in the inode. This may result in data corruption or extent conversion happening from IO completion interrupt because we don't properly set buffer_defer_completion() when unlocked DIO races with locked DIO to unwritten extent. Since unlocked DIO doesn't need io_end for anything, just avoid allocating it and corrupting pointer from inode for locked DIO. A cleaner fix would be to avoid these games with io_end pointer from the inode but that requires more intrusive changes so we leave that for later. Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24ext4: don't allow r/w mounts if metadata blocks overlap the superblockTheodore Ts'o
commit 18db4b4e6fc31eda838dd1c1296d67dbcb3dc957 upstream. If some metadata block, such as an allocation bitmap, overlaps the superblock, it's very likely that if the file system is mounted read/write, the results will not be pretty. So disallow r/w mounts for file systems corrupted in this particular way. Backport notes: 3.18.y is missing bc98a42c1f7d ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)") and e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") so we simply use the sb MS_RDONLY check from pre bc98a42c1f7d in place of the sb_rdonly function used in the upstream variant of the patch. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Harsh Shandilya <harsh@prjkt.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24ext4: fail ext4_iget for root directory if unallocatedTheodore Ts'o
commit 8e4b5eae5decd9dfe5a4ee369c22028f90ab4c44 upstream. If the root directory has an i_links_count of zero, then when the file system is mounted, then when ext4_fill_super() notices the problem and tries to call iput() the root directory in the error return path, ext4_evict_inode() will try to free the inode on disk, before all of the file system structures are set up, and this will result in an OOPS caused by a NULL pointer dereference. This issue has been assigned CVE-2018-1092. https://bugzilla.kernel.org/show_bug.cgi?id=199179 https://bugzilla.redhat.com/show_bug.cgi?id=1560777 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24ext4: don't update checksum of new initialized bitmapsTheodore Ts'o
commit 044e6e3d74a3d7103a0c8a9305dfd94d64000660 upstream. When reading the inode or block allocation bitmap, if the bitmap needs to be initialized, do not update the checksum in the block group descriptor. That's because we're not set up to journal those changes. Instead, just set the verified bit on the bitmap block, so that it's not necessary to validate the checksum. When a block or inode allocation actually happens, at that point the checksum will be calculated, and update of the bg descriptor block will be properly journalled. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24jbd2: if the journal is aborted then don't allow update of the log tailTheodore Ts'o
commit 85e0c4e89c1b864e763c4e3bb15d0b6d501ad5d9 upstream. This updates the jbd2 superblock unnecessarily, and on an abort we shouldn't truncate the log. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24fs/reiserfs/journal.c: add missing resierfs_warning() argAndrew Morton
commit 9ad553abe66f8be3f4755e9fa0a6ba137ce76341 upstream. One use of the reiserfs_warning() macro in journal_init_dev() is missing a parameter, causing the following warning: REISERFS warning (device loop0): journal_init_dev: Cannot open '%s': %i journal_init_dev: This also causes a WARN_ONCE() warning in the vsprintf code, and then a panic if panic_on_warn is set. Please remove unsupported %/ in format string WARNING: CPU: 1 PID: 4480 at lib/vsprintf.c:2138 format_decode+0x77f/0x830 lib/vsprintf.c:2138 Kernel panic - not syncing: panic_on_warn set ... Just add another string argument to the macro invocation. Addresses https://syzkaller.appspot.com/bug?id=0627d4551fdc39bf1ef5d82cd9eef587047f7718 Link: http://lkml.kernel.org/r/d678ebe1-6f54-8090-df4c-b9affad62293@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: <syzbot+6bd77b88c1977c03f584@syzkaller.appspotmail.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jeff Mahoney <jeffm@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24ubifs: Check ubifs_wbuf_sync() return codeRichard Weinberger
commit aac17948a7ce01fb60b9ee6cf902967a47b3ce26 upstream. If ubifs_wbuf_sync() fails we must not write a master node with the dirty marker cleared. Otherwise it is possible that in case of an IO error while syncing we mark the filesystem as clean and UBIFS refuses to recover upon next mount. Cc: <stable@vger.kernel.org> Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24getname_kernel() needs to make sure that ->name != ->iname in long caseAl Viro
commit 30ce4d1903e1d8a7ccd110860a5eef3c638ed8be upstream. missed it in "kill struct filename.separate" several years ago. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13ovl: filter trusted xattr for non-adminMiklos Szeredi
[ Upstream commit a082c6f680da298cf075886ff032f32ccb7c5e1a ] Filesystems filter out extended attributes in the "trusted." domain for unprivlieged callers. Overlay calls underlying filesystem's method with elevated privs, so need to do the filtering in overlayfs too. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff()Eryu Guan
[ Upstream commit 624327f8794704c5066b11a52f9da6a09dce7f9a ] ext4_find_unwritten_pgoff() is used to search for offset of hole or data in page range [index, end] (both inclusive), and the max number of pages to search should be at least one, if end == index. Otherwise the only page is missed and no hole or data is found, which is not correct. When block size is smaller than page size, this can be demonstrated by preallocating a file with size smaller than page size and writing data to the last block. E.g. run this xfs_io command on a 1k block size ext4 on x86_64 host. # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \ -c "seek -d 0" /mnt/ext4/testfile wrote 1024/1024 bytes at offset 2048 1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec) Whence Result DATA EOF Data at offset 2k was missed, and lseek(2) returned ENXIO. This is unconvered by generic/285 subtest 07 and 08 on ppc64 host, where pagesize is 64k. Because a recent change to generic/285 reduced the preallocated file size to smaller than 64k. Signed-off-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()Dan Carpenter
[ Upstream commit 662f9a105b4322b8559d448f86110e6ec24b8738 ] If xdr_inline_decode() fails then we end up returning ERR_PTR(0). The caller treats NULL returns as -ENOMEM so it doesn't really hurt runtime, but obviously we intended to set an error code here. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errorsKonstantin Khlebnikov
[ Upstream commit 9651e6b2e20648d04d5e1fe6479a3056047e8781 ] I've got another report about breaking ext4 by ENOMEM error returned from ext4_mb_load_buddy() caused by memory shortage in memory cgroup. This time inside ext4_discard_preallocations(). This patch replaces ext4_error() with ext4_warning() where errors returned from ext4_mb_load_buddy() are not fatal and handled by caller: * ext4_mb_discard_group_preallocations() - called before generating ENOSPC, we'll try to discard other group or return ENOSPC into user-space. * ext4_trim_all_free() - just stop trimming and return ENOMEM from ioctl. Some callers cannot handle errors, thus __GFP_NOFAIL is used for them: * ext4_discard_preallocations() * ext4_mb_discard_lg_preallocations() Fixes: adb7ef600cc9 ("ext4: use __GFP_NOFAIL in ext4_free_blocks()") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13btrfs: fix incorrect error return ret being passed to mapping_set_errorColin Ian King
[ Upstream commit bff5baf8aa37a97293725a16c03f49872249c07e ] The setting of return code ret should be based on the error code passed into function end_extent_writepage and not on ret. Thanks to Liu Bo for spotting this mistake in the original fix I submitted. Detected by CoverityScan, CID#1414312 ("Logically dead code") Fixes: 5dca6eea91653e ("Btrfs: mark mapping with error flag to report errors to userspace") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13SMB2: Fix share type handlingChristophe JAILLET
[ Upstream commit cd1230070ae1c12fd34cf6a557bfa81bf9311009 ] In fs/cifs/smb2pdu.h, we have: #define SMB2_SHARE_TYPE_DISK 0x01 #define SMB2_SHARE_TYPE_PIPE 0x02 #define SMB2_SHARE_TYPE_PRINT 0x03 Knowing that, with the current code, the SMB2_SHARE_TYPE_PRINT case can never trigger and printer share would be interpreted as disk share. So, test the ShareType value for equality instead. Fixes: faaf946a7d5b ("CIFS: Add tree connect/disconnect capability for SMB2") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13CIFS: silence lockdep splat in cifs_relock_file()Rabin Vincent
[ Upstream commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d ] cifs_relock_file() can perform a down_write() on the inode's lock_sem even though it was already performed in cifs_strict_readv(). Lockdep complains about this. AFAICS, there is no problem here, and lockdep just needs to be told that this nesting is OK. ============================================= [ INFO: possible recursive locking detected ] 4.11.0+ #20 Not tainted --------------------------------------------- cat/701 is trying to acquire lock: (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00 but task is already holding lock: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&cifsi->lock_sem); lock(&cifsi->lock_sem); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by cat/701: #0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 stack backtrace: CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20 Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x17dd/0x2260 ? trace_hardirqs_on_thunk+0x1a/0x1c ? preempt_schedule_irq+0x6b/0x80 lock_acquire+0xcc/0x260 ? lock_acquire+0xcc/0x260 ? cifs_reopen_file+0x7a7/0xc00 down_read+0x2d/0x70 ? cifs_reopen_file+0x7a7/0xc00 cifs_reopen_file+0x7a7/0xc00 ? printk+0x43/0x4b cifs_readpage_worker+0x327/0x8a0 cifs_readpage+0x8c/0x2a0 generic_file_read_iter+0x692/0xd00 cifs_strict_readv+0x29f/0x310 generic_file_splice_read+0x11c/0x1c0 do_splice_to+0xa5/0xc0 splice_direct_to_actor+0xfa/0x350 ? generic_pipe_buf_nosteal+0x10/0x10 do_splice_direct+0xb5/0xe0 do_sendfile+0x278/0x3a0 SyS_sendfile64+0xc4/0xe0 entry_SYSCALL_64_fastpath+0x1f/0xbe Signed-off-by: Rabin Vincent <rabinv@axis.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13NFSv4.1: Work around a Linux server bug...Trond Myklebust
[ Upstream commit f4b23de3dda1536590787c9e5c3d16b8738ab108 ] It turns out the Linux server has a bug in its implementation of supattr_exclcreat; it returns the set of all attributes, whether or not they are supported by minor version 1. In order to avoid a regression, we therefore apply the supported_attrs as a mask on top of whatever the server sent us. Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13lockd: fix lockd shutdown raceJ. Bruce Fields
[ Upstream commit efda760fe95ea15291853c8fa9235c32d319cd98 ] As reported by David Jeffery: "a signal was sent to lockd while lockd was shutting down from a request to stop nfs. The signal causes lockd to call restart_grace() which puts the lockd_net structure on the grace list. If this signal is received at the wrong time, it will occur after lockd_down_net() has called locks_end_grace() but before lockd_down_net() stops the lockd thread. This leads to lockd putting the lockd_net structure back on the grace list, then exiting without anything removing it from the list." So, perform the final locks_end_grace() from the the lockd thread; this ensures it's serialized with respect to restart_grace(). Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSIONTrond Myklebust
[ Upstream commit 0048fdd06614a4ea088f9fcad11511956b795698 ] If the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION because we are trunking, then RECLAIM_COMPLETE must handle that by calling nfs4_schedule_session_recovery() and then retrying. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08fs/proc: Stop trying to report thread stacksAndy Lutomirski
commit b18cb64ead400c01bf1580eeba330ace51f8087d upstream. This reverts more of: b76437579d13 ("procfs: mark thread stack correctly in proc/<pid>/maps") ... which was partially reverted by: 65376df58217 ("proc: revert /proc/<pid>/maps [stack:TID] annotation") Originally, /proc/PID/task/TID/maps was the same as /proc/TID/maps. In current kernels, /proc/PID/maps (or /proc/TID/maps even for threads) shows "[stack]" for VMAs in the mm's stack address range. In contrast, /proc/PID/task/TID/maps uses KSTK_ESP to guess the target thread's stack's VMA. This is racy, probably returns garbage and, on arches with CONFIG_TASK_INFO_IN_THREAD=y, is also crash-prone: KSTK_ESP is not safe to use on tasks that aren't known to be running ordinary process-context kernel code. This patch removes the difference and just shows "[stack]" for VMAs in the mm's stack range. This is IMO much more sensible -- the actual "stack" address really is treated specially by the VM code, and the current thread stack isn't even well-defined for programs that frequently switch stacks on their own. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux API <linux-api@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tycho Andersen <tycho.andersen@canonical.com> Link: http://lkml.kernel.org/r/3e678474ec14e0a0ec34c611016753eea2e1b8ba.1475257877.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08fs: compat: Remove warning from COMPATIBLE_IOCTLMark Charlebois
commit 9280cdd6fe5b8287a726d24cc1d558b96c8491d7 upstream. cmd in COMPATIBLE_IOCTL is always a u32, so cast it so there isn't a warning about an overflow in XFORM. From: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28staging: ncpfs: memory corruption in ncp_read_kernel()Dan Carpenter
commit 4c41aa24baa4ed338241d05494f2c595c885af8f upstream. If the server is malicious then *bytes_read could be larger than the size of the "target" buffer. It would lead to memory corruption when we do the memcpy(). Reported-by: Dr Silvio Cesare of InfoSect <Silvio Cesare <silvio.cesare@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24nfsd4: permit layoutget of executable-only filesBenjamin Coddington
[ Upstream commit 66282ec1cf004c09083c29cb5e49019037937bbd ] Clients must be able to read a file in order to execute it, and for pNFS that means the client needs to be able to perform a LAYOUTGET on the file. This behavior for executable-only files was added for OPEN in commit a043226bc140 "nfsd4: permit read opens of executable-only files". This fixes up xfstests generic/126 on block/scsi layouts. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24cifs: small underflow in cnvrtDosUnixTm()Dan Carpenter
[ Upstream commit 564277eceeca01e02b1ef3e141cfb939184601b4 ] January is month 1. There is no zero-th month. If someone passes a zero month then it means we read from one space before the start of the total_days_of_prev_months[] array. We may as well also be strict about days as well. Fixes: 1bd5bbcb6531 ("[CIFS] Legacy time handling for Win9x and OS/2 part 1") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24Btrfs: send, fix file hole not being preserved due to inline extentFilipe Manana
[ Upstream commit e1cbfd7bf6dabdac561c75d08357571f44040a45 ] Normally we don't have inline extents followed by regular extents, but there's currently at least one harmless case where this happens. For example, when the page size is 4Kb and compression is enabled: $ mkfs.btrfs -f /dev/sdb $ mount -o compress /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xaa 0 4K" -c "fsync" /mnt/foobar $ xfs_io -c "pwrite -S 0xbb 8K 4K" -c "fsync" /mnt/foobar In this case we get a compressed inline extent, representing 4Kb of data, followed by a hole extent and then a regular data extent. The inline extent was not expanded/converted to a regular extent exactly because it represents 4Kb of data. This does not cause any apparent problem (such as the issue solved by commit e1699d2d7bf6 ("btrfs: add missing memset while reading compressed inline extents")) except trigger an unexpected case in the incremental send code path that makes us issue an operation to write a hole when it's not needed, resulting in more writes at the receiver and wasting space at the receiver. So teach the incremental send code to deal with this particular case. The issue can be currently triggered by running fstests btrfs/137 with compression enabled (MOUNT_OPTIONS="-o compress" ./check btrfs/137). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>