summaryrefslogtreecommitdiff
path: root/net
diff options
context:
space:
mode:
authorEric Dumazet <eric.dumazet@gmail.com>2012-04-03 09:37:01 +0000
committerPaul Gortmaker <paul.gortmaker@windriver.com>2014-02-10 16:10:51 -0500
commitc37c78f51f66e519724c27a291f434df079fc8ee (patch)
treec02be52428487c77eeba279849ff9c8aacf40048 /net
parentb34a6044b346f2277fe6e2318db1da2abfd1bf65 (diff)
tcp: allow splice() to build full TSO packets
commit 2f53384424251c06038ae612e56231b96ab610ee upstream. vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096) The call to tcp_push() at the end of do_tcp_sendpages() forces an immediate xmit when pipe is not already filled, and tso_fragment() try to split these skb to MSS multiples. 4096 bytes are usually split in a skb with 2 MSS, and a remaining sub-mss skb (assuming MTU=1500) This makes slow start suboptimal because many small frames are sent to qdisc/driver layers instead of big ones (constrained by cwnd and packets in flight of course) In fact, applications using sendmsg() (adding an additional memory copy) instead of vmsplice()/splice()/sendfile() are a bit faster because of this anomaly, especially if serving small files in environments with large initial [c]wnd. Call tcp_push() only if MSG_MORE is not set in the flags parameter. This bit is automatically provided by splice() internals but for the last page, or on all pages if user specified SPLICE_F_MORE splice() flag. In some workloads, this can reduce number of sent logical packets by an order of magnitude, making zero-copy TCP actually faster than one-copy :) Reported-by: Tom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Diffstat (limited to 'net')
-rw-r--r--net/ipv4/tcp.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3a8cbf72b06e..cea0a9223c5d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -849,7 +849,7 @@ wait_for_memory:
}
out:
- if (copied)
+ if (copied && !(flags & MSG_MORE))
tcp_push(sk, flags, mss_now, tp->nonagle);
return copied;