summaryrefslogtreecommitdiff
path: root/fs/splice.c
AgeCommit message (Collapse)Author
2019-06-11fs: prevent page refcount overflow in pipe_buf_getMatthew Wilcox
commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream. Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23splice: don't merge into linked buffersJann Horn
commit a0ce2f0aa6ad97c3d4927bf2ca54bcebdf062d55 upstream. Before this patch, it was possible for two pipes to affect each other after data had been transferred between them with tee(): ============ $ cat tee_test.c int main(void) { int pipe_a[2]; if (pipe(pipe_a)) err(1, "pipe"); int pipe_b[2]; if (pipe(pipe_b)) err(1, "pipe"); if (write(pipe_a[1], "abcd", 4) != 4) err(1, "write"); if (tee(pipe_a[0], pipe_b[1], 2, 0) != 2) err(1, "tee"); if (write(pipe_b[1], "xx", 2) != 2) err(1, "write"); char buf[5]; if (read(pipe_a[0], buf, 4) != 4) err(1, "read"); buf[4] = 0; printf("got back: '%s'\n", buf); } $ gcc -o tee_test tee_test.c $ ./tee_test got back: 'abxx' $ ============ As suggested by Al Viro, fix it by creating a separate type for non-mergeable pipe buffers, then changing the types of buffers in splice_pipe_to_pipe() and link_pipe(). Cc: <stable@vger.kernel.org> Fixes: 7c77f0b3f920 ("splice: implement pipe to pipe splicing") Fixes: 70524490ee2e ("[PATCH] splice: add support for sys_tee()") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-23vfs: fix uninitialized flags in splice_to_pipe()Miklos Szeredi
commit 5a81e6a171cdbd1fa8bc1fdd80c23d3d71816fac upstream. Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the unused part of the pipe ring buffer. Previously splice_to_pipe() left the flags value alone, which could result in incorrect behavior. Uninitialized flags appears to have been there from the introduction of the splice syscall. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06splice: reinstate SIGPIPE/EPIPE handlingLinus Torvalds
commit 52bce91165e5f2db422b2b972e83d389e5e4725c upstream. Commit 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()") caused a regression when there were no more readers left on a pipe that was being spliced into: rather than the expected SIGPIPE and -EPIPE return value, the writer would end up waiting forever for space to free up (which obviously was not going to happen with no readers around). Fixes: 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()") Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org> Debugged-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-26fix default_file_splice_read()Al Viro
Botched calculation of number of pages. As the result, we were dropping pieces when doing splice to pipe from e.g. 9p. Reported-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-10splice: remove detritus from generic_file_splice_read()Al Viro
i_size check is a leftover from the horrors that used to play with the page cache in that function. With the switch to ->read_iter(), it's neither needed nor correct - for gfs2 it ends up being buggy, since i_size is not guaranteed to be correct until later (inside ->read_iter()). Spotted-by: Abhi Das <adas@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-10fix ITER_PIPE interaction with direct_IOAl Viro
by making sure we call iov_iter_advance() on original iov_iter even if direct_IO (done on its copy) has returned 0. It's a no-op for old iov_iter flavours and does the right thing (== truncation of the stuff we'd allocated, but not filled) in ITER_PIPE case. Failures (e.g. -EIO) get caught and dealt with by cleanup in generic_file_read_iter(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05pipe: add pipe_buf_confirm() helperMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05pipe: add pipe_buf_release() helperMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05pipe: add pipe_buf_get() helperMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05switch default_file_splice_read() to use of pipe-backed iov_iterAl Viro
we only use iov_iter_get_pages_alloc() and iov_iter_advance() - pages are filled by kernel_readv() via a kvec array (as we used to do all along), so iov_iter here is used only as a way of arranging for those pages to be in pipe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05switch generic_file_splice_read() to use of ->read_iter()Al Viro
... and kill the ->splice_read() instances that can be switched to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05new iov_iter flavour: pipe-backedAl Viro
iov_iter variant for passing data into pipe. copy_to_iter() copies data into page(s) it has allocated and stuffs them into the pipe; copy_page_to_iter() stuffs there a reference to the page given to it. Both will try to coalesce if possible. iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages() and friends will do as copy_to_iter() would have and return the pages where the data would've been copied. iov_iter_advance() will truncate everything past the spot it has advanced to. New primitive: iov_iter_pipe(), used for initializing those. pipe should be locked all along. Running out of space acts as fault would for iovec-backed ones; in other words, giving it to ->read_iter() may result in short read if the pipe overflows, or -EFAULT if it happens with nothing copied there. In other words, ->read_iter() on those acts pretty much like ->splice_read(). Moreover, all generic_file_splice_read() users, as well as many other ->splice_read() instances can be switched to that scheme - that'll happen in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03new helper: add_to_pipe()Al Viro
single-buffer analogue of splice_to_pipe(); vmsplice_to_pipe() switched to that, leaving splice_to_pipe() only for ->splice_read() instances (and that only until they are converted as well). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03splice: lift pipe_lock out of splice_to_pipe()Al Viro
* splice_to_pipe() stops at pipe overflow and does *not* take pipe_lock * ->splice_read() instances do the same * vmsplice_to_pipe() and do_splice() (ultimate callers of splice_to_pipe()) arrange for waiting, looping, etc. themselves. That should make pipe_lock the outermost one. Unfortunately, existing rules for the amount passed by vmsplice_to_pipe() and do_splice() are quite ugly _and_ userland code can be easily broken by changing those. It's not even "no more than the maximal capacity of this pipe" - it's "once we'd fed pipe->nr_buffers pages into the pipe, leave instead of waiting". Considering how poorly these rules are documented, let's try "wait for some space to appear, unless given SPLICE_F_NONBLOCK, then push into pipe and if we run into overflow, we are done". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03splice: switch get_iovec_page_array() to iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-03splice_to_pipe(): don't open-code wakeup_pipe_readers()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-11Merge branch 'ovl-fixes' into for-linusAl Viro
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-03do_splice_to(): cap the size before passing to ->splice_read()Al Viro
pipe capacity won't exceed 2G anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-18Merge branches 'work.lookups', 'work.misc' and 'work.preadv2' into for-nextAl Viro
2016-03-18splice: handle zero nr_pages in splice_to_pipe()Rabin Vincent
Running the following command: busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null with any tracing enabled pretty very quickly leads to various NULL pointer dereferences and VM BUG_ON()s, such as these: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0) kernel BUG at include/linux/mm.h:367! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89 (busybox's cat uses sendfile(2), unlike the coreutils version) This is because tracing_splice_read_pipe() can call splice_to_pipe() with spd->nr_pages == 0. spd_pages underflows in splice_to_pipe() and we fill the page pointers and the other fields of the pipe_buffers with garbage. All other callers of splice_to_pipe() avoid calling it when nr_pages == 0, and we could make tracing_splice_read_pipe() do that too, but it seems reasonable to have splice_to_page() handle this condition gracefully. Cc: stable@vger.kernel.org Signed-off-by: Rabin Vincent <rabin@rab.in> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-04vfs: pass a flags argument to vfs_readv/vfs_writevChristoph Hellwig
This way we can set kiocb flags also from the sync read/write path for the read_iter/write_iter operations. For now there is no way to pass flags to plain read/write operations as there is no real need for that, and all flags passed are explicitly rejected for these files. Signed-off-by: Milosz Tanski <milosz@adfin.com> [hch: rebased on top of my kiocb changes] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stephen Bates <stephen.bates@pmcs.com> Tested-by: Stephen Bates <stephen.bates@pmcs.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGEAbhi Das
During testing, I discovered that __generic_file_splice_read() returns 0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first page of a single/multi-page splice read operation. This EOF return code causes the userspace test to (correctly) report a zero-length read error when it was expecting otherwise. The current strategy of returning a partial non-zero read when ->readpage returns AOP_TRUNCATED_PAGE works only when the failed page is not the first of the lot being processed. This patch attempts to retry lookup and call ->readpage again on pages that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my tests pass and I haven't noticed any unwanted side effects. This version removes the thrice-retry loop and instead indefinitely retries lookups on AOP_TRUNCATED_PAGE errors from ->readpage. This behavior is now similar to do_generic_file_read(). Signed-off-by: Abhi Das <adas@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23vfs: Avoid softlockups with sendfile(2)Jan Kara
The following test program from Dmitry can cause softlockups or RCU stalls as it copies 1GB from tmpfs into eventfd and we don't have any scheduling point at that path in sendfile(2) implementation: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); Add cond_resched() into __splice_from_pipe() to fix the problem. CC: Dmitry Vyukov <dvyukov@google.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-23vfs: Make sendfile(2) killable even betterJan Kara
Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where sendfile(2) was doing a lot of tiny writes into a filesystem and thus was unkillable for a long time. However sendfile(2) can be (mis)used to issue lots of writes into arbitrary file descriptor such as evenfd or similar special file descriptors which never hit the standard filesystem write path and thus are still unkillable. E.g. the following example from Dmitry burns CPU for ~16s on my test system without possibility to be killed: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); There are actually quite a few tests for pending signals in sendfile code however we data to write is always available none of them seems to trigger. So fix the problem by adding a test for pending signal into splice_from_pipe_next() also before the loop waiting for pipe buffers to be available. This should fix all the lockup issues with sendfile of the do-ton-of-tiny-writes nature. CC: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-06mm, fs: introduce mapping_gfp_constraint()Michal Hocko
There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge first patchbomb from Andrew Morton: - a few misc things - ocfs2 udpates - kernel/watchdog.c feature work (took ages to get right) - most of MM. A few tricky bits are held up and probably won't make 4.2. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits) mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc() mm, thp: respect MPOL_PREFERRED policy with non-local node tmpfs: truncate prealloc blocks past i_size mm/memory hotplug: print the last vmemmap region at the end of hot add memory mm/mmap.c: optimization of do_mmap_pgoff function mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan mm: kmemleak: avoid deadlock on the kmemleak object insertion error path mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup() mm: kmemleak: fix delete_object_*() race when called on the same memory block mm: kmemleak: allow safe memory scanning during kmemleak disabling memcg: convert mem_cgroup->under_oom from atomic_t to int memcg: remove unused mem_cgroup->oom_wakeups frontswap: allow multiple backends x86, mirror: x86 enabling - find mirrored memory ranges mm/memblock: allocate boot time data structures from mirrored memory mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute mm: do not ignore mapping_gfp_mask in page cache allocation paths mm/cma.c: fix typos in comments mm/oom_kill.c: print points as unsigned int mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages ...
2015-06-24mm: do not ignore mapping_gfp_mask in page cache allocation pathsMichal Hocko
page_cache_read, do_generic_file_read, __generic_file_splice_read and __ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling add_to_page_cache_lru which might cause recursion into fs down in the direct reclaim path if the mapping really relies on GFP_NOFS semantic. This doesn't seem to be the case now because page_cache_read (page fault path) doesn't seem to suffer from the reclaim recursion issues and do_generic_file_read and __generic_file_splice_read also shouldn't be called under fs locks which would deadlock in the reclaim path. Anyway it is better to obey mapping gfp mask and prevent from later breakage. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-25net: af_unix: implement splice for stream af_unix socketsHannes Frederic Sowa
unix_stream_recvmsg is refactored to unix_stream_read_generic in this patch and enhanced to deal with pipe splicing. The refactoring is inneglible, we mostly have to deal with a non-existing struct msghdr argument. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-06splice: sendfile() at once fails for big filesChristophe Leroy
Using sendfile with below small program to get MD5 sums of some files, it appear that big files (over 64kbytes with 4k pages system) get a wrong MD5 sum while small files get the correct sum. This program uses sendfile() to send a file to an AF_ALG socket for hashing. /* md5sum2.c */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <fcntl.h> #include <sys/socket.h> #include <sys/stat.h> #include <sys/types.h> #include <linux/if_alg.h> int main(int argc, char **argv) { int sk = socket(AF_ALG, SOCK_SEQPACKET, 0); struct stat st; struct sockaddr_alg sa = { .salg_family = AF_ALG, .salg_type = "hash", .salg_name = "md5", }; int n; bind(sk, (struct sockaddr*)&sa, sizeof(sa)); for (n = 1; n < argc; n++) { int size; int offset = 0; char buf[4096]; int fd; int sko; int i; fd = open(argv[n], O_RDONLY); sko = accept(sk, NULL, 0); fstat(fd, &st); size = st.st_size; sendfile(sko, fd, &offset, size); size = read(sko, buf, sizeof(buf)); for (i = 0; i < size; i++) printf("%2.2x", buf[i]); printf(" %s\n", argv[n]); close(fd); close(sko); } exit(0); } Test below is done using official linux patch files. First result is with a software based md5sum. Second result is with the program above. root@vgoip:~# ls -l patch-3.6.* -rw-r--r-- 1 root root 64011 Aug 24 12:01 patch-3.6.2.gz -rw-r--r-- 1 root root 94131 Aug 24 12:01 patch-3.6.3.gz root@vgoip:~# md5sum patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz root@vgoip:~# ./md5sum2 patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz 5fd77b24e68bb24dcc72d6e57c64790e patch-3.6.3.gz After investivation, it appears that sendfile() sends the files by blocks of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each block, the SPLICE_F_MORE flag is missing, therefore the hashing operation is reset as if it was the end of the file. This patch adds SPLICE_F_MORE to the flags when more data is pending. With the patch applied, we get the correct sums: root@vgoip:~# md5sum patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz root@vgoip:~# ./md5sum2 patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-04-15dax: unify ext2/4_{dax,}_file_operationsBoaz Harrosh
The original dax patchset split the ext2/4_file_operations because of the two NULL splice_read/splice_write in the dax case. In the vfs if splice_read/splice_write are NULL we then call default_splice_read/write. What we do here is make generic_file_splice_read aware of IS_DAX() so the original ext2/4_file_operations can be used as is. For write it appears that iter_file_splice_write is just fine. It uses the regular f_op->write(file,..) or new_sync_write(file, ...). Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-11vmsplice_to_user(): switch to import_iovec()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-25fs: move struct kiocb to fs.hChristoph Hellwig
struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-29fs: add vfs_iter_{read,write} helpersChristoph Hellwig
Simple helpers that pass an arbitrary iov_iter to filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-29new helper: iov_iter_bvec()Al Viro
similar to iov_iter_kvec(), for ITER_BVEC ones Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-24vfs: export do_splice_direct() to modulesMiklos Szeredi
Export do_splice_direct() to modules. Needed by overlay filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-06-12Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linusAl Viro
Backmerge of dcache.c changes from mainline. It's that, or complete rebase... Conflicts: fs/splice.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-12kill generic_file_splice_write()Al Viro
no callers left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-12fs/splice.c: remove unneeded exportsAl Viro
ocfs2 was using a bunch of splice.c guts... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-12->splice_write() via ->write_iter()Al Viro
iter_file_splice_write() - a ->splice_write() instance that gathers the pipe buffers, builds a bio_vec-based iov_iter covering those and feeds it to ->write_iter(). A bunch of simple cases coverted to that... [AV: fixed the braino spotted by Cyrill] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-28vfs: fix vmplice_to_user()Miklos Szeredi
Commit 6130f5315ee8 "switch vmsplice_to_user() to copy_page_to_iter()" in v3.15-rc1 broke vmsplice(2). This patch fixes two bugs: - count is not initialized to a proper value, which resulted in no data being copied - if rw_copy_check_uvector() returns negative then the iov might be leaked. Tested OK. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06start adding the tag to iov_iterAl Viro
For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01switch vmsplice_to_user() to copy_page_to_iter()Al Viro
I've switched the sanity checks on iovec to rw_copy_check_uvector(); we might need to do a local analog, if any behaviour differences are not actually bugfixes here... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01pipe: kill ->map() and ->unmap()Al Viro
all pipe_buffer_operations have the same instances of those... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-22fuse: fix pipe_buf_operationsMiklos Szeredi
Having this struct in module memory could Oops when if the module is unloaded while the buffer still persists in a pipe. Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal merge them into nosteal_pipe_buf_ops (this is the same as default_pipe_buf_ops except stealing the page from the buffer is not allowed). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org
2013-10-24file->f_op is never NULL...Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second set of VFS changes from Al Viro: "Assorted f_pos race fixes, making do_splice_direct() safe to call with i_mutex on parent, O_TMPFILE support, Jeff's locks.c series, ->d_hash/->d_compare calling conventions changes from Linus, misc stuff all over the place." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) Document ->tmpfile() ext4: ->tmpfile() support vfs: export lseek_execute() to modules lseek_execute() doesn't need an inode passed to it block_dev: switch to fixed_size_llseek() cpqphp_sysfs: switch to fixed_size_llseek() tile-srom: switch to fixed_size_llseek() proc_powerpc: switch to fixed_size_llseek() ubi/cdev: switch to fixed_size_llseek() pci/proc: switch to fixed_size_llseek() isapnp: switch to fixed_size_llseek() lpfc: switch to fixed_size_llseek() locks: give the blocked_hash its own spinlock locks: add a new "lm_owner_key" lock operation locks: turn the blocked_list into a hashtable locks: convert fl_link to a hlist_node locks: avoid taking global lock if possible when waking up blocked waiters locks: protect most of the file_lock handling with i_lock locks: encapsulate the fl_link list handling locks: make "added" in __posix_lock_file a bool ...
2013-06-29splice: lift checks from do_splice_from() into callersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29lift file_*_write out of do_splice_direct()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>