tcp: be more careful in tcp_fragment()

commit b617158dc096709d8600c53b6052144d12b89fab upstream. Some applications set tiny SO_SNDBUF values and expect TCP to just work. Recent patches to address CVE-2019-11478 broke them in case of losses, since retransmits might be prevented. We should allow these flows to make progress. This patch allows the first and last skb in retransmit queue to be split even if memory limits are hit. It also adds the some room due to the fact that tcp_sendmsg() and tcp_sendpage() might overshoot sk_wmem_queued by about one full TSO skb (64KB size). Note this allowance was already present in stable backports for kernels < 4.15 Note for < 4.15 backports : tcp_rtx_queue_tail() will probably look like : static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk); return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); } Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Tested-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Cc: Jonathan Looney <jtl@netflix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 8c3088f895a0cfdf630973a9cd32f5e085eb973c)
author: Eric Dumazet <edumazet@google.com> 2019-08-06 17:09:14 +0200
committer: Jason Liu <jason.hui.liu@nxp.com> 2019-09-06 15:49:42 +0800
commit: a5982ca5602bdc57db4214971b55e01e6d2ea353 (patch)
tree: e24d41b80bc11f164dd3db9a1a924a3d87a98c71 /net
parent: b69ce5a2cfca586f9c3e7d784bfc2ba402a0515b (diff)
1 files changed, 10 insertions, 1 deletions
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a5960b9b6741..a99086bf26ea 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1264,6 +1264,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *buff;
 	int nsize, old_factor;
+	long limit;
 	int nlen;
 	u8 flags;
 
@@ -1274,7 +1275,15 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 	if (nsize < 0)
 		nsize = 0;
 
-	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 0x20000)) {
+	/* tcp_sendmsg() can overshoot sk_wmem_queued by one full size skb.
+	 * We need some allowance to not penalize applications setting small
+	 * SO_SNDBUF values.
+	 * Also allow first and last skb in retransmit queue to be split.
+	 */
+	limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
+	if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
+		     skb != tcp_rtx_queue_head(sk) &&
+		     skb != tcp_rtx_queue_tail(sk))) {
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
 		return -ENOMEM;
 	}
author	Eric Dumazet <edumazet@google.com>	2019-08-06 17:09:14 +0200
committer	Jason Liu <jason.hui.liu@nxp.com>	2019-09-06 15:49:42 +0800
commit	a5982ca5602bdc57db4214971b55e01e6d2ea353 (patch)
tree	e24d41b80bc11f164dd3db9a1a924a3d87a98c71 /net
parent	b69ce5a2cfca586f9c3e7d784bfc2ba402a0515b (diff)