summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-02-11nilfs2: fix fix very long mount time issueVyacheslav Dubeyko
commit a9bae189542e71f91e61a4428adf6e5a7dfe8063 upstream. There exists a situation when GC can work in background alone without any other filesystem activity during significant time. The nilfs_clean_segments() method calls nilfs_segctor_construct() that updates superblocks in the case of NILFS_SC_SUPER_ROOT and THE_NILFS_DISCONTINUED flags are set. But when GC is working alone the nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag. As a result, the update of superblocks doesn't occurred all this time and in the case of SPOR superblocks keep very old values of last super root placement. SYMPTOMS: Trying to mount a NILFS2 volume after SPOR in such environment ends with very long mounting time (it can achieve about several hours in some cases). REPRODUCING PATH: 1. It needs to use external USB HDD, disable automount and doesn't make any additional filesystem activity on the NILFS2 volume. 2. Generate temporary file with size about 100 - 500 GB (for example, dd if=/dev/zero of=<file_name> bs=1073741824 count=200). The size of file defines duration of GC working. 3. Then it needs to delete file. 4. Start GC manually by means of command "nilfs-clean -p 0". When you start GC by means of such way then, at the end, superblocks is updated by once. So, for simulation of SPOR, it needs to wait sometime (15 - 40 minutes) and simply switch off USB HDD manually. 5. Switch on USB HDD again and try to mount NILFS2 volume. As a result, NILFS2 volume will mount during very long time. REPRODUCIBILITY: 100% FIX: This patch adds checking that superblocks need to update and set THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call. Reported-by: Sergey Alexandrov <splavgm@gmail.com> Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Tested-by: Vyacheslav Dubeyko <slava@dubeyko.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03xfs: fix periodic log flushingDave Chinner
[Please take this patch for -stable in kernels 3.5-3.7. It doesn't have an equivalent upstream commit because the code was removed before the bug was discovered. See f661f1e0bf50 and 7e18530bef6a.] There is a logic inversion in xfssyncd_worker() which means that the log is not periodically forced or idled correctly. This means that metadata changes aggregated in memory do not get flushed in a timely manner, and hence if filesystem is not cleanly unmounted those changes can be lost. This loss can manifest itself even hours after the changes were made if the filesystem is left to idle without a sync() occurring between the last modification and the crash/shutdown occuring. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03xfs: fix _xfs_buf_find oops on blocks beyond the filesystem endDave Chinner
commit eb178619f930fa2ba2348de332a1ff1c66a31424 upstream. When _xfs_buf_find is passed an out of range address, it will fail to find a relevant struct xfs_perag and oops with a null dereference. This can happen when trying to walk a filesystem with a metadata inode that has a partially corrupted extent map (i.e. the block number returned is corrupt, but is otherwise intact) and we try to read from the corrupted block address. In this case, just fail the lookup. If it is readahead being issued, it will simply not be done, but if it is real read that fails we will get an error being reported. Ideally this case should result in an EFSCORRUPTED error being reported, but we cannot return an error through xfs_buf_read() or xfs_buf_get() so this lookup failure may result in ENOMEM or EIO errors being reported instead. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Cc: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 sessionTrond Myklebust
commit c489ee290bdbbace6bb63ebe6ebd4dd605819495 upstream. NFS4ERR_DELAY is a legal reply when we call DESTROY_SESSION. It usually means that the server is busy handling an unfinished RPC request. Just sleep for a second and then retry. We also need to be able to handle the NFS4ERR_BACK_CHAN_BUSY return value. If the NFS server has outstanding callbacks, we just want to similarly sleep & retry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recoveryTrond Myklebust
commit 65436ec0c8e344d9b23302b686e418f2a7b7cf7b upstream. We do need to start the lease recovery thread prior to waiting for the client initialisation to complete in NFSv4.1. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ben Greear <greearb@candelatech.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03NFSv4: Fix NFSv4 trunking discoveryTrond Myklebust
commit 202c312dba7d95b96493b412c606163a0cd83984 upstream. If walking the list in nfs4[01]_walk_client_list fails, then the most likely explanation is that the server dropped the clientid before we actually managed to confirm it. As long as our nfs_client is the very last one in the list to be tested, the caller can be assured that this is the case when the final return value is NFS4ERR_STALE_CLIENTID. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03NFSv4: Fix NFSv4 reference counting for trunked sessionsTrond Myklebust
commit 4ae19c2dd713edb7b8ad3d4ab9d234ed5dcb6b98 upstream. The reference counting in nfs4_init_client assumes wongly that it is safe for nfs4_discover_server_trunking() to return a pointer to a nfs_client prior to bumping the reference count. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ben Greear <greearb@candelatech.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03NFS: Don't silently fail setattr() requests on mountpointsTrond Myklebust
commit ab225417825963b6dc66be7ea80f94ac1378dfdf upstream. Ensure that any setattr and getattr requests for junctions and/or mountpoints are sent to the server. Ever since commit 0ec26fd0698 (vfs: automount should ignore LOOKUP_FOLLOW), we have silently dropped any setattr requests to a server-side mountpoint. For referrals, we have silently dropped both getattr and setattr requests. This patch restores the original behaviour for setattr on mountpoints, and tries to do the same for referrals, provided that we have a filehandle... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03NFS: Fix error reporting in nfs_xdev_mountTrond Myklebust
commit dee972b967ae111ad5705733de17a3bfc4632311 upstream. Currently, nfs_xdev_mount converts all errors from clone_server() to ENOMEM, which can then leak to userspace (for instance to 'mount'). Fix that. Also ensure that if nfs_fs_mount_common() returns an error, we don't dprintk(0)... The regression originated in commit 3d176e3fe4f6dc379b252bf43e2e146a8f7caf01 (NFS: Use nfs_fs_mount_common() for xdev mounts) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03xfs: Fix possible use-after-free with AIOJan Kara
commit 4b05d09c18d9aa62d2e7fb4b057f54e5a38963f5 upstream. Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. Signed-off-by: Jan Kara <jack@suse.cz> CC: xfs@oss.sgi.com CC: Ben Myers <bpm@sgi.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03fs/cifs/cifs_dfs_ref.c: fix potential memory leakageCong Ding
commit 10b8c7dff5d3633b69e77f57d404dab54ead3787 upstream. When it goes to error through line 144, the memory allocated to *devname is not freed, and the caller doesn't free it either in line 250. So we free the memroy of *devname in function cifs_compose_mount_options() when it goes to error. Signed-off-by: Cong Ding <dinggnu@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21ext4: init pagevec in ext4_da_block_invalidatepagesEric Sandeen
commit 66bea92c69477a75a5d37b9bfed5773c92a3c4b4 upstream. ext4_da_block_invalidatepages is missing a pagevec_init(), which means that pvec->cold contains random garbage. This affects whether the page goes to the front or back of the LRU when ->cold makes it to free_hot_cold_page() Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17libceph: Unlock unprocessed pages in start_read() error pathDavid Zafman
(cherry picked from commit 8884d53dd63b1d9315b343564fcbe1ede004a99e) Function start_read() can get an error before processing all pages. It must not only release the remaining pages, but unlock them too. This fixes http://tracker.newdream.net/issues/3370 Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ceph: call handle_cap_grant() for cap import messageYan, Zheng
(cherry picked from commit 0e5e1774a92e6fe9c511585de8f078b4c4c68dbb) If client sends cap message that requests new max size during exporting caps, the exporting MDS will drop the message quietly. So the client may wait for the reply that updates the max size forever. call handle_cap_grant() for cap import message can avoid this issue. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ceph: Fix __ceph_do_pending_vmtruncateYan, Zheng
(cherry picked from commit a85f50b6ef93fbbb2ae932ce9b2376509d172796) we should set i_truncate_pending to 0 after page cache is truncated to i_truncate_size Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ceph: Don't add dirty inode to dirty list if caps is in migrationYan, Zheng
(cherry picked from commit 0685235ffd9dbdb9ccbda587f8a3c83ad1d5a921) Add dirty inode to cap_dirty_migrating list instead, this can avoid ceph_flush_dirty_caps() entering infinite loop. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ceph: Fix infinite loop in __wake_requestsYan, Zheng
(cherry picked from commit ed75ec2cd19b47efcd292b6e23f58e56f4c5bc34) __wake_requests() will enter infinite loop if we use it to wake requests in the session->s_waiting list. __wake_requests() deletes requests from the list and __do_request() adds requests back to the list. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ceph: Don't update i_max_size when handling non-auth capYan, Zheng
(cherry picked from commit 5e62ad30157d0da04cf40c6d1a2f4bc840948b9c) The cap from non-auth mds doesn't have a meaningful max_size value. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17libceph: remove 'osdtimeout' optionSage Weil
(cherry picked from commit 83aff95eb9d60aff5497e9f44a2ae906b86d8e88) This would reset a connection with any OSD that had an outstanding request that was taking more than N seconds. The idea was that if the OSD was buggy, the client could compensate by resending the request. In reality, this only served to hide server bugs, and we haven't actually seen such a bug in quite a while. Moreover, the userspace client code never did this. More importantly, often the request is taking a long time because the OSD is trying to recover, or overloaded, and killing the connection and retrying would only make the situation worse by giving the OSD more work to do. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17vfs: add missing virtual cache flush after editing partial pagesLinus Torvalds
commit 6d283dba3721cc43be014b50a1acc2f35860a65a upstream. Andrew Morton pointed this out a month ago, and then I completely forgot about it. If we read a partial last page of a block device, we will zero out the end of the page, but since that page can then be mapped into user space, we should also make sure to flush the cache on architectures that have virtual caches. We have the flush_dcache_page() function for this, so use it. Now, in practice this really never matters, because nobody sane uses virtual caches to begin with, and they largely exist on old broken RISC arhitectures. And even if you did run on one of those obsolete CPU's, the whole "mmap and access the last partial page of a block device" behavior probably doesn't actually exist. The normal IO functions (read/write) will never see the zeroed-out part of the page that migth not be coherent in the cache, because they honor the size of the device. So I'm marking this for stable (3.7 only), but I'm not sure anybody will ever care. Pointed-out-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17epoll: prevent missed events on EPOLL_CTL_MODEric Wong
commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 upstream. EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to ensure events are not missed. Since the modifications to the interest mask are not protected by the same lock as ep_poll_callback, we need to ensure the change is visible to other CPUs calling ep_poll_callback. We also need to ensure f_op->poll() has an up-to-date view of past events which occured before we modified the interest mask. So this barrier also pairs with the barrier in wq_has_sleeper(). This should guarantee either ep_poll_callback or f_op->poll() (or both) will notice the readiness of a recently-ready/modified item. This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in: http://thread.gmane.org/gmane.linux.kernel/1408782/ Signed-off-by: Eric Wong <normalperson@yhbt.net> Cc: Hans Verkuil <hans.verkuil@cisco.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Voellmy <andreas.voellmy@yale.edu> Tested-by: "Junchang(Jason) Wang" <junchang.wang@yale.edu> Cc: netdev@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17udf: don't increment lenExtents while writing to a holeNamjae Jeon
commit fb719c59bdb4fca86ee1fd1f42ab3735ca12b6b2 upstream. Incrementing lenExtents even while writing to a hole is bad for performance as calls to udf_discard_prealloc and udf_truncate_tail_extent would not return from start if isize != lenExtents Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17udf: fix memory leak while allocating blocks during writeNamjae Jeon
commit 2fb7d99d0de3fd8ae869f35ab682581d8455887a upstream. Need to brelse the buffer_head stored in cur_epos and next_epos. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: release buffer in failed path in dx_probe()Guo Chao
commit 0ecaef0644973e9006fdbc6974301047aaff9bc6 upstream. If checksum fails, we should also release the buffer read from previous iteration. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: "Darrick J. Wong" <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: avoid hang when mounting non-journal filesystems with orphan listTheodore Ts'o
commit 0e9a9a1ad619e7e987815d20262d36a2f95717ca upstream. When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: lock i_mutex when truncating orphan inodesTheodore Ts'o
commit 721e3eba21e43532e438652dd8f1fcdfce3187e7 upstream. Commit c278531d39 added a warning when ext4_flush_unwritten_io() is called without i_mutex being taken. It had previously not been taken during orphan cleanup since races weren't possible at that point in the mount process, but as a result of this c278531d39, we will now see a kernel WARN_ON in this case. Take the i_mutex in ext4_orphan_cleanup() to suppress this warning. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: do not try to write superblock on ro remount w/o journalMichael Tokarev
commit d096ad0f79a782935d2e06ae8fb235e8c5397775 upstream. When a journal-less ext4 filesystem is mounted on a read-only block device (blockdev --setro will do), each remount (for other, unrelated, flags, like suid=>nosuid etc) results in a series of scary messages from kernel telling about I/O errors on the device. This is becauese of the following code ext4_remount(): if (sbi->s_journal == NULL) ext4_commit_super(sb, 1); at the end of remount procedure, which forces writing (flushing) of a superblock regardless whenever it is dirty or not, if the filesystem is readonly or not, and whenever the device itself is readonly or not. We only need call ext4_commit_super when the file system had been previously mounted read/write. Thanks to Eric Sandeen for help in diagnosing this issue. Signed-off-By: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17jbd2: fix assertion failure in jbd2_journal_flush()Jan Kara
commit d7961c7fa4d2e3c3f12be67e21ba8799b5a7238a upstream. The following race is possible between start_this_handle() and someone calling jbd2_journal_flush(). Process A Process B start_this_handle(). if (journal->j_barrier_count) # false if (!journal->j_running_transaction) { #true read_unlock(&journal->j_state_lock); jbd2_journal_lock_updates() jbd2_journal_flush() write_lock(&journal->j_state_lock); if (journal->j_running_transaction) { # false ... wait for committing trans ... write_unlock(&journal->j_state_lock); ... write_lock(&journal->j_state_lock); if (!journal->j_running_transaction) { # true jbd2_get_transaction(journal, new_transaction); write_unlock(&journal->j_state_lock); goto repeat; # eventually blocks on j_barrier_count > 0 ... J_ASSERT(!journal->j_running_transaction); # fails We fix the race by rechecking j_barrier_count after reacquiring j_state_lock in exclusive mode. Reported-by: yjwsignal@empal.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: check dioread_nolock on remountJan Kara
commit 261cb20cb2f0737a247aaf08dff7eb065e3e5b66 upstream. Currently we allow enabling dioread_nolock mount option on remount for filesystems where blocksize < PAGE_CACHE_SIZE. This isn't really supported so fix the bug by moving the check for blocksize != PAGE_CACHE_SIZE into parse_options(). Change the original PAGE_SIZE to PAGE_CACHE_SIZE along the way because that's what we are really interested in. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: fix extent tree corruption caused by hole punchForrest Liu
commit c36575e663e302dbaa4d16b9c72d2c9a913a9aef upstream. When depth of extent tree is greater than 1, logical start value of interior node is not correctly updated in ext4_ext_rm_idx. Signed-off-by: Forrest Liu <forrestl@synology.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Ashish Sangwan <ashishsangwan2@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17jffs2: hold erase_completion_lock on exitAlexey Khoroshilov
commit 2cbba75a56ea78e6876b4e2547a882f10b3fe72b upstream. Users of jffs2_do_reserve_space() expect they still held erase_completion_lock after call to it. But there is a path where jffs2_do_reserve_space() leaves erase_completion_lock unlocked. The patch fixes it. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: fix possible use after free with metadata csumTheodore Ts'o
commit aeb1e5d69a5be592e86a926be73efb38c55af404 upstream. Commit fa77dcfafeaa introduces block bitmap checksum calculation into ext4_new_inode() in the case that block group was uninitialized. However we brelse() the bitmap buffer before we attempt to checksum it so we have no guarantee that the buffer is still there. Fix this by releasing the buffer after the possible checksum computation. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: fix memory leak in ext4_xattr_set_acl()'s error pathEugene Shatokhin
commit 24ec19b0ae83a385ad9c55520716da671274b96c upstream. In ext4_xattr_set_acl(), if ext4_journal_start() returns an error, posix_acl_release() will not be called for 'acl' which may result in a memory leak. This patch fixes that. Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17pstore/ram: Fix undefined usage of rounddown_pow_of_two(0)Maxime Bizon
commit b042e47491ba5f487601b5141a3f1d8582304170 upstream. record_size / console_size / ftrace_size can be 0 (this is how you disable the feature), but rounddown_pow_of_two(0) is undefined. As suggested by Kees Cook, use !is_power_of_2() as a condition to call rounddown_pow_of_two and avoid its undefined behavior on the value 0. This issue has been present since commit 1894a253 (ramoops: Move to fs/pstore/ram.c). Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Florian Fainelli <ffainelli@freebox.fr> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11cifs: don't compare uniqueids in cifs_prime_dcache unless server inode ↵Jeff Layton
numbers are in use commit 2f2591a34db6c9361faa316c91a6e320cb4e6aee upstream. Oliver reported that commit cd60042c caused his cifs mounts to continually thrash through new inodes on readdir. His servers are not sending inode numbers (or he's not using them), and the new test in that function doesn't account for that sort of setup correctly. If we're not using server inode numbers, then assume that the inode attached to the dentry hasn't changed. Go ahead and update the attributes in place, but keep the same inode number. Reported-and-Tested-by: Oliver Mössinger <Oliver.Moessinger@ichaus.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11cifs: rename cifs_readdir_lookup to cifs_prime_dcache and make it void returnJeff Layton
commit eb1b3fa5cdb9c27bdec8f262acf757a06588eb2d upstream. The caller doesn't do anything with the dentry, so there's no point in holding a reference to it on return. Also cifs_prime_dcache better describes the actual purpose of the function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11proc: pid/status: show all supplementary groupsArtem Bityutskiy
commit 8d238027b87e654be552eabdf492042a34c5c300 upstream. We display a list of supplementary group for each process in /proc/<pid>/status. However, we show only the first 32 groups, not all of them. Although this is rare, but sometimes processes do have more than 32 supplementary groups, and this kernel limitation breaks user-space apps that rely on the group list in /proc/<pid>/status. Number 32 comes from the internal NGROUPS_SMALL macro which defines the length for the internal kernel "small" groups buffer. There is no apparent reason to limit to this value. This patch removes the 32 groups printing limit. The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX, which is currently set to 65536. And this is the maximum count of groups we may possibly print. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11cifs: adjust sequence number downward after signing NT_CANCEL requestJeff Layton
commit 31efee60f489c759c341454d755a9fd13de8c03d upstream. When a call goes out, the signing code adjusts the sequence number upward by two to account for the request and the response. An NT_CANCEL however doesn't get a response of its own, it just hurries the server along to get it to respond to the original request more quickly. Therefore, we must adjust the sequence number back down by one after signing a NT_CANCEL request. Reported-by: Tim Perry <tdparmor-sambabugs@yahoo.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11cifs: move check for NULL socket into smb_send_rqstJeff Layton
commit ea702b80e0bbb2448e201472127288beb82ca2fe upstream. Cai reported this oops: [90701.616664] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [90701.625438] IP: [<ffffffff814a343e>] kernel_setsockopt+0x2e/0x60 [90701.632167] PGD fea319067 PUD 103fda4067 PMD 0 [90701.637255] Oops: 0000 [#1] SMP [90701.640878] Modules linked in: des_generic md4 nls_utf8 cifs dns_resolver binfmt_misc tun sg igb iTCO_wdt iTCO_vendor_support lpc_ich pcspkr i2c_i801 i2c_core i7core_edac edac_core ioatdma dca mfd_core coretemp kvm_intel kvm crc32c_intel microcode sr_mod cdrom ata_generic sd_mod pata_acpi crc_t10dif ata_piix libata megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [90701.677655] CPU 10 [90701.679808] Pid: 9627, comm: ls Tainted: G W 3.7.1+ #10 QCI QSSC-S4R/QSSC-S4R [90701.688950] RIP: 0010:[<ffffffff814a343e>] [<ffffffff814a343e>] kernel_setsockopt+0x2e/0x60 [90701.698383] RSP: 0018:ffff88177b431bb8 EFLAGS: 00010206 [90701.704309] RAX: ffff88177b431fd8 RBX: 00007ffffffff000 RCX: ffff88177b431bec [90701.712271] RDX: 0000000000000003 RSI: 0000000000000006 RDI: 0000000000000000 [90701.720223] RBP: ffff88177b431bc8 R08: 0000000000000004 R09: 0000000000000000 [90701.728185] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001 [90701.736147] R13: ffff88184ef92000 R14: 0000000000000023 R15: ffff88177b431c88 [90701.744109] FS: 00007fd56a1a47c0(0000) GS:ffff88105fc40000(0000) knlGS:0000000000000000 [90701.753137] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [90701.759550] CR2: 0000000000000028 CR3: 000000104f15f000 CR4: 00000000000007e0 [90701.767512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [90701.775465] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [90701.783428] Process ls (pid: 9627, threadinfo ffff88177b430000, task ffff88185ca4cb60) [90701.792261] Stack: [90701.794505] 0000000000000023 ffff88177b431c50 ffff88177b431c38 ffffffffa014fcb1 [90701.802809] ffff88184ef921bc 0000000000000000 00000001ffffffff ffff88184ef921c0 [90701.811123] ffff88177b431c08 ffffffff815ca3d9 ffff88177b431c18 ffff880857758000 [90701.819433] Call Trace: [90701.822183] [<ffffffffa014fcb1>] smb_send_rqst+0x71/0x1f0 [cifs] [90701.828991] [<ffffffff815ca3d9>] ? schedule+0x29/0x70 [90701.834736] [<ffffffffa014fe6d>] smb_sendv+0x3d/0x40 [cifs] [90701.841062] [<ffffffffa014fe96>] smb_send+0x26/0x30 [cifs] [90701.847291] [<ffffffffa015801f>] send_nt_cancel+0x6f/0xd0 [cifs] [90701.854102] [<ffffffffa015075e>] SendReceive+0x18e/0x360 [cifs] [90701.860814] [<ffffffffa0134a78>] CIFSFindFirst+0x1a8/0x3f0 [cifs] [90701.867724] [<ffffffffa013f731>] ? build_path_from_dentry+0xf1/0x260 [cifs] [90701.875601] [<ffffffffa013f731>] ? build_path_from_dentry+0xf1/0x260 [cifs] [90701.883477] [<ffffffffa01578e6>] cifs_query_dir_first+0x26/0x30 [cifs] [90701.890869] [<ffffffffa015480d>] initiate_cifs_search+0xed/0x250 [cifs] [90701.898354] [<ffffffff81195970>] ? fillonedir+0x100/0x100 [90701.904486] [<ffffffffa01554cb>] cifs_readdir+0x45b/0x8f0 [cifs] [90701.911288] [<ffffffff81195970>] ? fillonedir+0x100/0x100 [90701.917410] [<ffffffff81195970>] ? fillonedir+0x100/0x100 [90701.923533] [<ffffffff81195970>] ? fillonedir+0x100/0x100 [90701.929657] [<ffffffff81195848>] vfs_readdir+0xb8/0xe0 [90701.935490] [<ffffffff81195b9f>] sys_getdents+0x8f/0x110 [90701.941521] [<ffffffff815d3b99>] system_call_fastpath+0x16/0x1b [90701.948222] Code: 66 90 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 53 48 83 ec 08 83 fe 01 48 8b 98 48 e0 ff ff 48 c7 80 48 e0 ff ff ff ff ff ff 74 22 <48> 8b 47 28 ff 50 68 65 48 8b 14 25 f0 c6 00 00 48 89 9a 48 e0 [90701.970313] RIP [<ffffffff814a343e>] kernel_setsockopt+0x2e/0x60 [90701.977125] RSP <ffff88177b431bb8> [90701.981018] CR2: 0000000000000028 [90701.984809] ---[ end trace 24bd602971110a43 ]--- This is likely due to a race vs. a reconnection event. The current code checks for a NULL socket in smb_send_kvec, but that's too late. By the time that check is done, the socket will already have been passed to kernel_setsockopt. Move the check into smb_send_rqst, so that it's checked earlier. In truth, this is a bit of a half-assed fix. The -ENOTSOCK error return here looks like it could bubble back up to userspace. The locking rules around the ssocket pointer are really unclear as well. There are cases where the ssocket pointer is changed without holding the srv_mutex, but I'm not clear whether there's a potential race here yet or not. This code seems like it could benefit from some fundamental re-think of how the socket handling should behave. Until then though, this patch should at least fix the above oops in most cases. Reported-and-Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11fs: Fix imbalance in freeze protection in mark_files_ro()Jan Kara
commit 72651cac884b1e285fa8e8314b10e9f1b8458802 upstream. File descriptors (even those for writing) do not hold freeze protection. Thus mark_files_ro() must call __mnt_drop_write() to only drop protection against remount read-only. Calling mnt_drop_write_file() as we do now results in: [ BUG: bad unlock balance detected! ] 3.7.0-rc6-00028-g88e75b6 #101 Not tainted ------------------------------------- kworker/1:2/79 is trying to release lock (sb_writers) at: [<ffffffff811b33b4>] mnt_drop_write+0x24/0x30 but there are no more locks to release! Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11xfs: fix stray dquot unlock when reclaiming dquotsDave Chinner
commit b870553cdecb26d5291af09602352b763e323df2 upstream. When we fail to get a dquot lock during reclaim, we jump to an error handler that unlocks the dquot. This is wrong as we didn't lock the dquot, and unlocking it means who-ever is holding the lock has had it silently taken away, and hence it results in a lock imbalance. Found by inspection while modifying the code for the numa-lru patchset. This fixes a random hang I've been seeing on xfstest 232 for the past several months. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11xfs: fix direct IO nested transaction deadlock.Dave Chinner
commit 437a255aa23766666aec78af63be4c253faa8d57 upstream. The direct IO path can do a nested transaction reservation when writing past the EOF. The first transaction is the append transaction for setting the filesize at IO completion, but we can also need a transaction for allocation of blocks. If the log is low on space due to reservations and small log, the append transaction can be granted after wating for space as the only active transaction in the system. This then attempts a reservation for an allocation, which there isn't space in the log for, and the reservation sleeps. The result is that there is nothing left in the system to wake up all the processes waiting for log space to come free. The stack trace that shows this deadlock is relatively innocuous: xlog_grant_head_wait xlog_grant_head_check xfs_log_reserve xfs_trans_reserve xfs_iomap_write_direct __xfs_get_blocks xfs_get_blocks_direct do_blockdev_direct_IO __blockdev_direct_IO xfs_vm_direct_IO generic_file_direct_write xfs_file_dio_aio_writ xfs_file_aio_write do_sync_write vfs_write This was discovered on a filesystem with a log of only 10MB, and a log stripe unit of 256k whih increased the base reservations by 512k. Hence a allocation transaction requires 1.2MB of log space to be available instead of only 260k, and so greatly increased the chance that there wouldn't be enough log space available for the nested transaction to succeed. The key to reproducing it is this mkfs command: mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b $SCRATCH_DEV The test case was a 1000 fsstress processes running with random freeze and unfreezes every few seconds. Thanks to Eryu Guan (eguan@redhat.com) for writing the test that found this on a system with a somewhat unique default configuration.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrew Dahl <adahl@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11tcp: fix MSG_SENDPAGE_NOTLAST logicEric Dumazet
[ Upstream commit ae62ca7b03217be5e74759dc6d7698c95df498b3 ] commit 35f9c09fe9c72e (tcp: tcp_sendpages() should call tcp_push() once) added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all frags but the last one for a splice() call. The condition used to set the flag in pipe_to_sendpage() relied on splice() user passing the exact number of bytes present in the pipe, or a smaller one. But some programs pass an arbitrary high value, and the test fails. The effect of this bug is a lack of tcp_push() at the end of a splice(pipe -> socket) call, and possibly very slow or erratic TCP sessions. We should both test sd->total_len and fact that another fragment is in the pipe (pipe->nrbufs > 1) Many thanks to Willy for providing very clear bug report, bisection and test programs. Reported-by: Willy Tarreau <w@1wt.eu> Bisected-by: Willy Tarreau <w@1wt.eu> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11SMB3 mounts fail with access denied to some serversSteve French
commit 52c0f4ad8ed462d81f1d37f56a74a71dc0c9bf0f upstream. We were checking incorrectly if signatures were required to be sent, so were always sending signatures after the initial session establishment. For SMB3 mounts (vers=3.0) this was a problem because we were putting SMB2 signatures in SMB3 requests which would cause access denied on mount (the tree connection would fail). This might also be worth considering for stable (for 3.7), as the error message on mount (access denied) is confusing to users and there is no workaround if the server is configured to only support smb3.0. I am ok either way. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11vfs: d_obtain_alias() needs to use "/" as default name.NeilBrown
commit b911a6bdeef5848c468597d040e3407e0aee04ce upstream. NFS appears to use d_obtain_alias() to create the root dentry rather than d_make_root. This can cause 'prepend_path()' to complain that the root has a weird name if an NFS filesystem is lazily unmounted. e.g. if "/mnt" is an NFS mount then { cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; } will cause a WARN message like WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0() ... Root dentry has weird name <> to appear in kernel logs. So change d_obtain_alias() to use "/" rather than "" as the anonymous name. Signed-off-by: NeilBrown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11nfs: avoid dereferencing null pointer in initiate_bulk_drainingNickolai Zeldovich
commit ecf0eb9edbb607d74f74b73c14af8b43f3729528 upstream. Fix an inverted null pointer check in initiate_bulk_draining(). Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11NFS: Ensure that we free the rpc_task after read and write cleanups are doneTrond Myklebust
commit 6db6dd7d3fd8f7c765dabc376493d6791ab28bd6 upstream. This patch ensures that we free the rpc_task after the cleanup callbacks are done in order to avoid a deadlock problem that can be triggered if the callback needs to wait for another workqueue item to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Bruce Fields <bfields@fieldses.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11nfs: fix null checking in nfs_get_option_str()Xi Wang
commit e25fbe380c4e3c09afa98bcdcd9d3921443adab8 upstream. The following null pointer check is broken. *option = match_strdup(args); return !option; The pointer `option' must be non-null, and thus `!option' is always false. Use `!*option' instead. The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some cut-and-paste code."). Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11pnfs: Increase the refcount when LAYOUTGET fails the first timeYanchuan Nian
commit 39e88fcfb1d5c6c4b1ff76ca2ab76cf449b850e8 upstream. The layout will be set unusable if LAYOUTGET fails. Is it reasonable to increase the refcount iff LAYOUTGET fails the first time? Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11NFS: Fix access to suid/sgid executablesWeston Andros Adamson
commit f8d9a897d4384b77f13781ea813156568f68b83e upstream. nfs_open_permission_mask() should only check MAY_EXEC for files that are opened with __FMODE_EXEC. Also fix NFSv4 access-in-open path in a similar way -- openflags must be used because fmode will not always have FMODE_EXEC set. This patch fixes https://bugzilla.kernel.org/show_bug.cgi?id=49101 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>