summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2008-08-06Bluetooth: Signal user-space for HIDP and BNEP socket errorsMarcel Holtmann
commit ec8dab36e0738d3059980d144e34f16a26bbda7d upstream When using the HIDP or BNEP kernel support, the user-space needs to know if the connection has been terminated for some reasons. Wake up the application if that happens. Otherwise kernel and user-space are no longer on the same page and weird behaviors can happen. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01netfilter -stable: nf_conntrack_tcp: fix endless loopPatrick McHardy
netfilter: nf_conntrack_tcp: fix endless loop Upstream commit 6b69fe0: When a conntrack entry is destroyed in process context and destruction is interrupted by packet processing and the packet is an attempt to reopen a closed connection, TCP conntrack tries to kill the old entry itself and returns NF_REPEAT to pass the packet through the hook again. This may lead to an endless loop: TCP conntrack repeatedly finds the old entry, but can not kill it itself since destruction is already in progress, but destruction in process context can not complete since TCP conntrack is keeping the CPU busy. Drop the packet in TCP conntrack if we can't kill the connection ourselves to avoid this. Reported by: hemao77@gmail.com [ Kernel bugzilla #11058 ] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-01tcp: Clear probes_out more aggressively in tcp_ack().David S. Miller
[ Upstream commit 4b53fb67e385b856a991d402096379dab462170a ] This is based upon an excellent bug report from Eric Dumazet. tcp_ack() should clear ->icsk_probes_out even if there are packets outstanding. Otherwise if we get a sequence of ACKs while we do have packets outstanding over and over again, we'll never clear the probes_out value and eventually think the connection is too sick and we'll reset it. This appears to be some "optimization" added to tcp_ack() in the 2.4.x timeframe. In 2.2.x, probes_out is pretty much always cleared by tcp_ack(). Here is Eric's original report: ---------------------------------------- Apparently, we can in some situations reset TCP connections in a couple of seconds when some frames are lost. In order to reproduce the problem, please try the following program on linux-2.6.25.* Setup some iptables rules to allow two frames per second sent on loopback interface to tcp destination port 12000 iptables -N SLOWLO iptables -A SLOWLO -m hashlimit --hashlimit 2 --hashlimit-burst 1 --hashlimit-mode dstip --hashlimit-name slow2 -j ACCEPT iptables -A SLOWLO -j DROP iptables -A OUTPUT -o lo -p tcp --dport 12000 -j SLOWLO Then run the attached program and see the output : # ./loop State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,1) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,3) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,5) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,7) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,9) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,11) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,201ms,13) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,188ms,15) write(): Connection timed out wrote 890 bytes but was interrupted after 9 seconds ESTAB 0 0 127.0.0.1:12000 127.0.0.1:54455 Exiting read() because no data available (4000 ms timeout). read 860 bytes While this tcp session makes progress (sending frames with 50 bytes of payload, every 500ms), linux tcp stack decides to reset it, when tcp_retries 2 is reached (default value : 15) tcpdump : 15:30:28.856695 IP 127.0.0.1.56554 > 127.0.0.1.12000: S 33788768:33788768(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7> 15:30:28.856711 IP 127.0.0.1.12000 > 127.0.0.1.56554: S 33899253:33899253(0) ack 33788769 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7> 15:30:29.356947 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 1:61(60) ack 1 win 257 15:30:29.356966 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 61 win 257 15:30:29.866415 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 61:111(50) ack 1 win 257 15:30:29.866427 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 111 win 257 15:30:30.366516 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 111:161(50) ack 1 win 257 15:30:30.366527 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 161 win 257 15:30:30.876196 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 161:211(50) ack 1 win 257 15:30:30.876207 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 211 win 257 15:30:31.376282 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 211:261(50) ack 1 win 257 15:30:31.376290 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 261 win 257 15:30:31.885619 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 261:311(50) ack 1 win 257 15:30:31.885631 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 311 win 257 15:30:32.385705 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 311:361(50) ack 1 win 257 15:30:32.385715 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 361 win 257 15:30:32.895249 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 361:411(50) ack 1 win 257 15:30:32.895266 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 411 win 257 15:30:33.395341 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 411:461(50) ack 1 win 257 15:30:33.395351 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 461 win 257 15:30:33.918085 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 461:511(50) ack 1 win 257 15:30:33.918096 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 511 win 257 15:30:34.418163 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 511:561(50) ack 1 win 257 15:30:34.418172 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 561 win 257 15:30:34.927685 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 561:611(50) ack 1 win 257 15:30:34.927698 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 611 win 257 15:30:35.427757 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 611:661(50) ack 1 win 257 15:30:35.427766 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 661 win 257 15:30:35.937359 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 661:711(50) ack 1 win 257 15:30:35.937376 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 711 win 257 15:30:36.437451 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 711:761(50) ack 1 win 257 15:30:36.437464 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 761 win 257 15:30:36.947022 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 761:811(50) ack 1 win 257 15:30:36.947039 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 811 win 257 15:30:37.447135 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 811:861(50) ack 1 win 257 15:30:37.447203 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 861 win 257 15:30:41.448171 IP 127.0.0.1.12000 > 127.0.0.1.56554: F 1:1(0) ack 861 win 257 15:30:41.448189 IP 127.0.0.1.56554 > 127.0.0.1.12000: R 33789629:33789629(0) win 0 Source of program : /* * small producer/consumer program. * setup a listener on 127.0.0.1:12000 * Forks a child * child connect to 127.0.0.1, and sends 10 bytes on this tcp socket every 100 ms * Father accepts connection, and read all data */ #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <unistd.h> #include <stdio.h> #include <time.h> #include <sys/poll.h> int port = 12000; char buffer[4096]; int main(int argc, char *argv[]) { int lfd = socket(AF_INET, SOCK_STREAM, 0); struct sockaddr_in socket_address; time_t t0, t1; int on = 1, sfd, res; unsigned long total = 0; socklen_t alen = sizeof(socket_address); pid_t pid; time(&t0); socket_address.sin_family = AF_INET; socket_address.sin_port = htons(port); socket_address.sin_addr.s_addr = htonl(INADDR_LOOPBACK); if (lfd == -1) { perror("socket()"); return 1; } setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(int)); if (bind(lfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) { perror("bind"); close(lfd); return 1; } if (listen(lfd, 1) == -1) { perror("listen()"); close(lfd); return 1; } pid = fork(); if (pid == 0) { int i, cfd = socket(AF_INET, SOCK_STREAM, 0); close(lfd); if (connect(cfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) { perror("connect()"); return 1; } for (i = 0 ; ;) { res = write(cfd, "blablabla\n", 10); if (res > 0) total += res; else if (res == -1) { perror("write()"); break; } else break; usleep(100000); if (++i == 10) { system("ss -on dst 127.0.0.1:12000"); i = 0; } } time(&t1); fprintf(stderr, "wrote %lu bytes but was interrupted after %g seconds\n", total, difftime(t1, t0)); system("ss -on | grep 127.0.0.1:12000"); close(cfd); return 0; } sfd = accept(lfd, (struct sockaddr *)&socket_address, &alen); if (sfd == -1) { perror("accept"); return 1; } close(lfd); while (1) { struct pollfd pfd[1]; pfd[0].fd = sfd; pfd[0].events = POLLIN; if (poll(pfd, 1, 4000) == 0) { fprintf(stderr, "Exiting read() because no data available (4000 ms timeout).\n"); break; } res = read(sfd, buffer, sizeof(buffer)); if (res > 0) total += res; else if (res == 0) break; else perror("read()"); } fprintf(stderr, "read %lu bytes\n", total); close(sfd); return 0; } ---------------------------------------- Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-28udplite: Protection against coverage value wrap-aroundGerrit Renker
[ Upstream commit 47112e25da41d9059626033986dc3353e101f815 ] This patch clamps the cscov setsockopt values to a maximum of 0xFFFF. Setsockopt values greater than 0xffff can cause an unwanted wrap-around. Further, IPv6 jumbograms are not supported (RFC 3838, 3.5), so that values greater than 0xffff are not even useful. Further changes: fixed a typo in the documentation. [ Add USHORT_MAX from upstream to linux/kernel.h -DaveM ] Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-28xfrm: fix fragmentation for ipv4 xfrm tunnelSteffen Klassert
[ Upstream commit fe833fca2eac6b3d3ad5e35f44ad4638362f1da8 ] When generating the ip header for the transformed packet we just copy the frag_off field of the ip header from the original packet to the ip header of the new generated packet. If we receive a packet as a chain of fragments, all but the last of the new generated packets have the IP_MF flag set. We have to mask the frag_off field to only keep the IP_DF flag from the original packet. This got lost with git commit 36cf9acf93e8561d9faec24849e57688a81eb9c5 ("[IPSEC]: Separate inner/outer mode processing on output") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-28raw: Restore /proc/net/raw correct behaviorEric Dumazet
[ Upstream commit 68be802cd5ad040fe8cfa33ce3031405df2d9117 ] I just noticed "cat /proc/net/raw" was buggy, missing '\n' separators. I believe this was introduced by commit 8cd850efa4948d57a2ed836911cfd1ab299e89c6 ([RAW]: Cleanup IPv4 raw_seq_show.) This trivial patch restores correct behavior, and applies to current Linus tree (should also be applied to stable tree as well.) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-28ipv6: use timer pendingStephen Hemminger
[ Upstream commit 847499ce71bdcc8fc542062df6ebed3e596608dd ] This fixes the bridge reference count problem and cleanups ipv6 FIB timer management. Don't use expires field, because it is not a proper way to test, instead use timer_pending(). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-24netfilter: nf_conntrack_tcp: fixing to check the lower bound of valid ACKJozsef Kadlecsik
Upstream commit 84ebe1c: Lost connections was reported by Thomas Bätzler (running 2.6.25 kernel) on the netfilter mailing list (see the thread "Weird nat/conntrack Problem with PASV FTP upload"). He provided tcpdump recordings which helped to find a long lingering bug in conntrack. In TCP connection tracking, checking the lower bound of valid ACK could lead to mark valid packets as INVALID because: - We have got a "higher or equal" inequality, but the test checked the "higher" condition only; fixed. - If the packet contains a SACK option, it could occur that the ACK value was before the left edge of our (S)ACK "window": if a previous packet from the other party intersected the right edge of the window of the receiver, we could move forward the window parameters beyond accepting a valid ack. Therefore in this patch we check the rightmost SACK edge instead of the ACK value in the lower bound of valid (S)ACK test. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-24can: add sanity checksOliver Hartkopp
commit 7f2d38eb7a42bea1c1df51bbdaa2ca0f0bdda07f upstream Even though the CAN netlayer only deals with CAN netdevices, the netlayer interface to the userspace and to the device layer should perform some sanity checks. This patch adds several sanity checks that mainly prevent userspace apps to send broken content into the system that may be misinterpreted by some other userspace application. Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de> Acked-by: Andre Naujoks <nautsch@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-24mac80211: detect driver tx bugsJohannes Berg
When a driver rejects a frame in it's ->tx() callback, it must also stop queues, otherwise mac80211 can go into a loop here. Detect this situation and abort the loop after five retries, warning about the driver bug. This patch was added to mainline as commit ef3a62d272f033989e83eb1f26505f93f93e3e69. Thanks to Larry Finger <Larry.Finger@lwfinger.net> for doing the -stable port. Cc: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-24sctp: Make sure N * sizeof(union sctp_addr) does not overflow.David S. Miller
commit 735ce972fbc8a65fb17788debd7bbe7b4383cc62 upstream As noticed by Gabriel Campana, the kmalloc() length arg passed in by sctp_getsockopt_local_addrs_old() can overflow if ->addr_num is large enough. Therefore, enforce an appropriate limit. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-21nf_conntrack_h323: fix memory leak in module initialization error pathPatrick McHardy
netfilter: nf_conntrack_h323: fix memory leak in module initialization error path Upstream commit 8a548868db62422113104ebc658065e3fe976951 Properly free h323_buffer when helper registration fails. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-21nf_conntrack_h323: fix module unload crashPatrick McHardy
netfilter: nf_conntrack_h323: fix module unload crash Upstream commit a56b8f81580761c65e4d8d0c04ac1cb7a788bdf1 The H.245 helper is not registered/unregistered, but assigned to connections manually from the Q.931 helper. This means on unload existing expectations and connections using the helper are not cleaned up, leading to the following oops on module unload: CPU 0 Unable to handle kernel paging request at virtual address c00a6828, epc == 802224dc, ra == 801d4e7c Oops[#1]: Cpu 0 $ 0 : 00000000 00000000 00000004 c00a67f0 $ 4 : 802a5ad0 81657e00 00000000 00000000 $ 8 : 00000008 801461c8 00000000 80570050 $12 : 819b0280 819b04b0 00000006 00000000 $16 : 802a5a60 80000000 80b46000 80321010 $20 : 00000000 00000004 802a5ad0 00000001 $24 : 00000000 802257a8 $28 : 802a4000 802a59e8 00000004 801d4e7c Hi : 0000000b Lo : 00506320 epc : 802224dc ip_conntrack_help+0x38/0x74 Tainted: P ra : 801d4e7c nf_iterate+0xbc/0x130 Status: 1000f403 KERNEL EXL IE Cause : 00800008 BadVA : c00a6828 PrId : 00019374 Modules linked in: ip_nat_pptp ip_conntrack_pptp ath_pktlog wlan_acl wlan_wep wlan_tkip wlan_ccmp wlan_xauth ath_pci ath_dev ath_dfs ath_rate_atheros wlan ath_hal ip_nat_tftp ip_conntrack_tftp ip_nat_ftp ip_conntrack_ftp pppoe ppp_async ppp_deflate ppp_mppe pppox ppp_generic slhc Process swapper (pid: 0, threadinfo=802a4000, task=802a6000) Stack : 801e7d98 00000004 802a5a60 80000000 801d4e7c 801d4e7c 802a5ad0 00000004 00000000 00000000 801e7d98 00000000 00000004 802a5ad0 00000000 00000010 801e7d98 80b46000 802a5a60 80320000 80000000 801d4f8c 802a5b00 00000002 80063834 00000000 80b46000 802a5a60 801e7d98 80000000 802ba854 00000000 81a02180 80b7e260 81a021b0 819b0000 819b0000 80570056 00000000 00000001 ... Call Trace: [<801e7d98>] ip_finish_output+0x0/0x23c [<801d4e7c>] nf_iterate+0xbc/0x130 [<801d4e7c>] nf_iterate+0xbc/0x130 [<801e7d98>] ip_finish_output+0x0/0x23c [<801e7d98>] ip_finish_output+0x0/0x23c [<801d4f8c>] nf_hook_slow+0x9c/0x1a4 One way to fix this would be to split helper cleanup from the unregistration function and invoke it for the H.245 helper, but since ctnetlink needs to be able to find the helper for synchonization purposes, a better fix is to register it normally and make sure its not assigned to connections during helper lookup. The missing l3num initialization is enough for this, this patch changes it to use AF_UNSPEC to make it more explicit though. Reported-by: liannan <liannan@twsz.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-21nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()Patrick McHardy
netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info() Upstream commit ceeff7541e5a4ba8e8d97ffbae32b3f283cb7a3f When creation of a new conntrack entry in ctnetlink fails after having set up the NAT mappings, the conntrack has an extension area allocated that is not getting properly destroyed when freeing the conntrack again. This means the NAT extension is still in the bysource hash, causing a crash when walking over the hash chain the next time: BUG: unable to handle kernel paging request at 00120fbd IP: [<c03d394b>] nf_nat_setup_info+0x221/0x58a *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Pid: 2795, comm: conntrackd Not tainted (2.6.26-rc5 #1) EIP: 0060:[<c03d394b>] EFLAGS: 00010206 CPU: 1 EIP is at nf_nat_setup_info+0x221/0x58a EAX: 00120fbd EBX: 00120fbd ECX: 00000001 EDX: 00000000 ESI: 0000019e EDI: e853bbb4 EBP: e853bbc8 ESP: e853bb78 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process conntrackd (pid: 2795, ti=e853a000 task=f7de10f0 task.ti=e853a000) Stack: 00000000 e853bc2c e85672ec 00000008 c0561084 63c1db4a 00000000 00000000 00000000 0002e109 61d2b1c3 00000000 00000000 00000000 01114e22 61d2b1c3 00000000 00000000 f7444674 e853bc04 00000008 c038e728 0000000a f7444674 Call Trace: [<c038e728>] nla_parse+0x5c/0xb0 [<c0397c1b>] ctnetlink_change_status+0x190/0x1c6 [<c0397eec>] ctnetlink_new_conntrack+0x189/0x61f [<c0119aee>] update_curr+0x3d/0x52 [<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8 [<c0390228>] nfnetlink_rcv_msg+0x18/0xd8 [<c0390210>] nfnetlink_rcv_msg+0x0/0xd8 [<c038d2ce>] netlink_rcv_skb+0x2d/0x71 [<c0390205>] nfnetlink_rcv+0x19/0x24 [<c038d0f5>] netlink_unicast+0x1b3/0x216 ... Move invocation of the extension destructors to nf_conntrack_free() to fix this problem. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10875 Reported-and-Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-16mac80211: send association event on IBSS createDan Williams
patch 507b06d0622480f8026d49a94f86068bb0fd6ed6 upstream Otherwise userspace has no idea the IBSS creation succeeded. Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-16tcp: Fix inconsistency source (CA_Open only when !tcp_left_out(tp))Ilpo Järvinen
[ upstream commit: 8aca6cb1179ed9bef9351028c8d8af852903eae2 ] It is possible that this skip path causes TCP to end up into an invalid state where ca_state was left to CA_Open while some segments already came into sacked_out. If next valid ACK doesn't contain new SACK information TCP fails to enter into tcp_fastretrans_alert(). Thus at least high_seq is set incorrectly to a too high seqno because some new data segments could be sent in between (and also, limited transmit is not being correctly invoked there). Reordering in both directions can easily cause this situation to occur. I guess we would want to use tcp_moderate_cwnd(tp) there as well as it may be possible to use this to trigger oversized burst to network by sending an old ACK with huge amount of SACK info, but I'm a bit unsure about its effects (mainly to FlightSize), so to be on the safe side I just currently fixed it minimally to keep TCP's state consistent (obviously, such nasty ACKs have been possible this far). Though it seems that FlightSize is already underestimated by some amount, so probably on the long term we might want to trigger recovery there too, if appropriate, to make FlightSize calculation to resemble reality at the time when the losses where discovered (but such change scares me too much now and requires some more thinking anyway how to do that as it likely involves some code shuffling). This bug was found by Brian Vowell while running my TCP debug patch to find cause of another TCP issue (fackets_out miscount). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-16tcp FRTO: work-around inorder receiversIlpo Järvinen
[ upstream commit: 79d44516b4b178ffb6e2159c75584cfcfc097914 ] If receiver consumes segments successfully only in-order, FRTO fallback to conventional recovery produces RTO loop because FRTO's forward transmissions will always get dropped and need to be resent, yet by default they're not marked as lost (which are the only segments we will retransmit in CA_Loss). Price to pay about this is occassionally unnecessarily retransmitting the forward transmission(s). SACK blocks help a bit to avoid this, so it's mainly a concern for NewReno case though SACK is not fully immune either. This change has a side-effect of fixing SACKFRTO problem where it didn't have snd_nxt of the RTO time available anymore when fallback become necessary (this problem would have only occured when RTO would occur for two or more segments and ECE arrives in step 3; no need to figure out how to fix that unless the TODO item of selective behavior is considered in future). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Damon L. Chesser <damon@damtek.com> Tested-by: Damon L. Chesser <damon@damtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16tcp FRTO: SACK variant is errorneously used with NewRenoIlpo Järvinen
[ upstream commit: 62ab22278308a40bcb7f4079e9719ab8b7fe11b5 ] Note: there's actually another bug in FRTO's SACK variant, which is the causing failure in NewReno case because of the error that's fixed here. I'll fix the SACK case separately (it's a separate bug really, though related, but in order to fix that I need to audit tp->snd_nxt usage a bit). There were two places where SACK variant of FRTO is getting incorrectly used even if SACK wasn't negotiated by the TCP flow. This leads to incorrect setting of frto_highmark with NewReno if a previous recovery was interrupted by another RTO. An eventual fallback to conventional recovery then incorrectly considers one or couple of segments as forward transmissions though they weren't, which then are not LOST marked during fallback making them "non-retransmittable" until the next RTO. In a bad case, those segments are really lost and are the only one left in the window. Thus TCP needs another RTO to continue. The next FRTO, however, could again repeat the same events making the progress of the TCP flow extremely slow. In order for these events to occur at all, FRTO must occur again in FRTOs step 3 while the key segments must be lost as well, which is not too likely in practice. It seems to most frequently with some small devices such as network printers that *seem* to accept TCP segments only in-order. In cases were key segments weren't lost, things get automatically resolved because those wrongly marked segments don't need to be retransmitted in order to continue. I found a reproducer after digging up relevant reports (few reports in total, none at netdev or lkml I know of), some cases seemed to indicate middlebox issues which seems now to be a false assumption some people had made. Bugzilla #10063 _might_ be related. Damon L. Chesser <damon@damtek.com> had a reproducable case and was kind enough to tcpdump it for me. With the tcpdump log it was quite trivial to figure out. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16tcp FRTO: Fix fallback to conventional recoveryIlpo Järvinen
[ upstream commit: a1c1f281b84a751fdb5ff919da3b09df7297619f ] It seems that commit 009a2e3e4ec ("[TCP] FRTO: Improve interoperability with other undo_marker users") run into another land-mine which caused fallback to conventional recovery to break: 1. Cumulative ACK arrives after FRTO retransmission 2. tcp_try_to_open sees zero retrans_out, clears retrans_stamp which should be kept like in CA_Loss state it would be 3. undo_marker change allowed tcp_packet_delayed to return true because of the cleared retrans_stamp once FRTO is terminated causing LossUndo to occur, which means all loss markings FRTO made are reverted. This means that the conventional recovery basically recovered one loss per RTT, which is not that efficient. It was quite unobvious that the undo_marker change broken something like this, I had a quite long session to track it down because of the non-intuitiviness of the bug (luckily I had a trivial reproducer at hand and I was also able to learn to use kprobes in the process as well :-)). This together with the NewReno+FRTO fix and FRTO in-order workaround this fixes Damon's problems, this and the first mentioned are enough to fix Bugzilla #10063. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Damon L. Chesser <damon@damtek.com> Tested-by: Damon L. Chesser <damon@damtek.com> Tested-by: Sebastian Hyrwall <zibbe@cisko.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16tcp: fix skb vs fack_count out-of-sync conditionIlpo Järvinen
[ upstream commit: a6604471db5e7a33474a7f16c64d6b118fae3e74 ] This bug is able to corrupt fackets_out in very rare cases. In order for this to cause corruption: 1) DSACK in the middle of previous SACK block must be generated. 2) In order to take that particular branch, part or all of the DSACKed segment must already be SACKed so that we have that in cache in the first place. 3) The new info must be top enough so that fackets_out will be updated on this iteration. ...then fack_count is updated while skb wasn't, then we walk again that particular segment thus updating fack_count twice for a single skb and finally that value is assigned to fackets_out by tcp_sacktag_one. It is safe to call tcp_sacktag_one just once for a segment (at DSACK), no need to call again for plain SACK. Potential problem of the miscount are limited to premature entry to recovery and to inflated reordering metric (which could even cancel each other out in the most the luckiest scenarios :-)). Both are quite insignificant in worst case too and there exists also code to reset them (fackets_out once sacked_out becomes zero and reordering metric on RTO). This has been reported by a number of people, because it occurred quite rarely, it has been very evasive. Andy Furniss was able to get it to occur couple of times so that a bit more info was collected about the problem using a debug patch, though it still required lot of checking around. Thanks also to others who have tried to help here. This is listed as Bugzilla #10346. The bug was introduced by me in commit 68f8353b48 ([TCP]: Rewrite SACK block processing & sack_recv_cache use), I probably thought back then that there's need to scan that entry twice or didn't dare to make it go through it just once there. Going through twice would have required restoring fack_count after the walk but as noted above, I chose to drop the additional walk step altogether here. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16tcp: Limit cwnd growth when deferring for GSOJohn Heffner
[ upstream commit: 246eb2af060fc32650f07203c02bdc0456ad76c7 ] This fixes inappropriately large cwnd growth on sender-limited flows when GSO is enabled, limiting cwnd growth to 64k. [ Backport to 2.6.25 by replacing sk->sk_gso_max_size with 65536 -DaveM ] Signed-off-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16tcp: Allow send-limited cwnd to grow up to max_burst when gso disabledJohn Heffner
[ upstream commit: ce447eb91409225f8a488f6b7b2a1bdf7b2d884f ] This changes the logic in tcp_is_cwnd_limited() so that cwnd may grow up to tcp_max_burst() even when sk_can_gso() is false, or when sysctl_tcp_tso_win_divisor != 0. Signed-off-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16tcp: TCP connection times out if ICMP frag needed is delayedSridhar Samudrala
[ upstream commit: 7d227cd235c809c36c847d6a597956ad9e9d2bae ] We are seeing an issue with TCP in handling an ICMP frag needed message that is received after net.ipv4.tcp_retries1 retransmits. The default value of retries1 is 3. So if the path mtu changes and ICMP frag needed is lost for the first 3 retransmits or if it gets delayed until 3 retransmits are done, TCP doesn't update MSS correctly and continues to retransmit the orginal message until it timesout after tcp_retries2 retransmits. I am seeing this issue even with the latest 2.6.25.4 kernel. In tcp_retransmit_timer(), when retransmits counter exceeds tcp_retries1 value, the dst cache entry of the socket is reset. At this time, if we receive an ICMP frag needed message, the dst entry gets updated with the new MTU, but the TCP sockets dst_cache entry remains NULL. So the next time when we try to retransmit after the ICMP frag needed is received, tcp_retransmit_skb() gets called. Here the cur_mss value is calculated at the start of the routine with a NULL sk_dst_cache. Instead we should call tcp_current_mss after the rebuild_header that caches the dst entry with the updated mtu. Also the rebuild_header should be called before tcp_fragment so that skb is fragmented if the mss goes down. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16vlan: Correctly handle device notifications for layered VLAN devicesPatrick McHardy
[ upstream commit: 81d85346b3fcd8b3167eac8b5fb415a210bd4345 ] Commit 30688a9 ([VLAN]: Handle vlan devices net namespace changing) changed the device notifier to special-case notifications for VLAN devices, effectively disabling state propagation to underlying VLAN devices. This is needed for layered VLANs though, so restore the original behaviour. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16ipsec: Use the correct ip_local_out functionHerbert Xu
[ upstream commit: 1ac06e0306d0192a7a4d9ea1c9e06d355ce7e7d3 ] Because the IPsec output function xfrm_output_resume does its own dst_output call it should always call __ip_local_output instead of ip_local_output as the latter may invoke dst_output directly. Otherwise the return values from nf_hook and dst_output may clash as they both use the value 1 but for different purposes. When that clash occurs this can cause a packet to be used after it has been freed which usually leads to a crash. Because the offending value is only returned from dst_output with qdiscs such as HTB, this bug is normally not visible. Thanks to Marco Berizzi for his perseverance in tracking this down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16net_sched: cls_api: fix return value for non-existant classifiersPatrick McHardy
[ upstream commit: f2df824948d559ea818e03486a8583e42ea6ab37 ] cls_api should return ENOENT when the requested classifier doesn't exist. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16net: Fix call to ->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags()David Woodhouse
[ upstream commit: 0e91796eb46e29edc791131c832a2232bcaed9dd ] Am I just being particularly dim today, or can the call to dev->change_rx_flags(dev, IFF_MULTICAST) in dev_change_flags() never happen? We've just set dev->flags = flags & IFF_MULTICAST, effectively. So the condition '(dev->flags ^ flags) & IFF_MULTICAST' is _never_ going to be true. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16can: Fix copy_from_user() results interpretationSam Ravnborg
[ Upstream commit: 3f91bd420a955803421f2db17b2e04aacfbb2bb8 ] Both copy_to_ and _from_user return the number of bytes, that failed to reach their destination, not the 0/-EXXX values. Based on patch from Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16bluetooth: rfcomm_dev_state_change deadlock fixDave Young
commit 537d59af73d894750cff14f90fe2b6d77fbab15b in mainline There's logic in __rfcomm_dlc_close: rfcomm_dlc_lock(d); d->state = BT_CLOSED; d->state_changed(d, err); rfcomm_dlc_unlock(d); In rfcomm_dev_state_change, it's possible that rfcomm_dev_put try to take the dlc lock, then we will deadlock. Here fixed it by unlock dlc before rfcomm_dev_get in rfcomm_dev_state_change. why not unlock just before rfcomm_dev_put? it's because there's another problem. rfcomm_dev_get/rfcomm_dev_del will take rfcomm_dev_lock, but in rfcomm_dev_add the lock order is : rfcomm_dev_lock --> dlc lock so I unlock dlc before the taken of rfcomm_dev_lock. Actually it's a regression caused by commit 1905f6c736cb618e07eca0c96e60e3c024023428 ("bluetooth : __rfcomm_dlc_close lock fix"), the dlc state_change could be two callbacks : rfcomm_sk_state_change and rfcomm_dev_state_change. I missed the rfcomm_sk_state_change that time. Thanks Arjan van de Ven <arjan@linux.intel.com> for the effort in commit 4c8411f8c115def968820a4df6658ccfd55d7f1a ("bluetooth: fix locking bug in the rfcomm socket cleanup handling") but he missed the rfcomm_dev_state_change lock issue. Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-16bluetooth: fix locking bug in the rfcomm socket cleanup handlingArjan van de Ven
[ Upstream commit: 7dccf1f4e1696c79bff064c3770867cc53cbc71c ] in net/bluetooth/rfcomm/sock.c, rfcomm_sk_state_change() does the following operation: if (parent && sock_flag(sk, SOCK_ZAPPED)) { /* We have to drop DLC lock here, otherwise * rfcomm_sock_destruct() will dead lock. */ rfcomm_dlc_unlock(d); rfcomm_sock_kill(sk); rfcomm_dlc_lock(d); } } which is fine, since rfcomm_sock_kill() will call sk_free() which will call rfcomm_sock_destruct() which takes the rfcomm_dlc_lock()... so far so good. HOWEVER, this assumes that the rfcomm_sk_state_change() function always gets called with the rfcomm_dlc_lock() taken. This is the case for all but one case, and in that case where we don't have the lock, we do a double unlock followed by an attempt to take the lock, which due to underflow isn't going anywhere fast. This patch fixes this by moving the stragling case inside the lock, like the other usages of the same call are doing in this code. This was found with the help of the www.kerneloops.org project, where this deadlock was observed 51 times at this point in time: http://www.kerneloops.org/search.php?search=rfcomm_sock_destruct Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16ax25: Fix NULL pointer dereference and lockup.Jarek Poplawski
[ Upstream commit: 7dccf1f4e1696c79bff064c3770867cc53cbc71c ] There is only one function in AX25 calling skb_append(), and it really looks suspicious: appends skb after previously enqueued one, but in the meantime this previous skb could be removed from the queue. This patch Fixes it the simple way, so this is not fully compatible with the current method, but testing hasn't shown any problems. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16af_key: Fix selector family initialization.Kazunori MIYAZAWA
[ upstream commit: 4da5105687e0993a3bbdcffd89b2b94d9377faab ] This propagates the xfrm_user fix made in commit bcf0dda8d2408fe1c1040cdec5a98e5fcad2ac72 ("[XFRM]: xfrm_user: fix selector family initialization") Based upon a bug report from, and tested by, Alan Swanson. Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09netfilter: nf_conntrack_ipv6: fix inconsistent lock state in ↵Patrick McHardy
nf_ct_frag6_gather() upstream commit: b9c698964614f71b9c8afeca163a945b4c2e2d20 [ 63.531438] ================================= [ 63.531520] [ INFO: inconsistent lock state ] [ 63.531520] 2.6.26-rc4 #7 [ 63.531520] --------------------------------- [ 63.531520] inconsistent {softirq-on-W} -> {in-softirq-W} usage. [ 63.531520] tcpsic6/3864 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 63.531520] (&q->lock#2){-+..}, at: [<c07175b0>] ipv6_frag_rcv+0xd0/0xbd0 [ 63.531520] {softirq-on-W} state was registered at: [ 63.531520] [<c0143bba>] __lock_acquire+0x3aa/0x1080 [ 63.531520] [<c0144906>] lock_acquire+0x76/0xa0 [ 63.531520] [<c07a8f0b>] _spin_lock+0x2b/0x40 [ 63.531520] [<c0727636>] nf_ct_frag6_gather+0x3f6/0x910 ... According to this and another similar lockdep report inet_fragment locks are taken from nf_ct_frag6_gather() with softirqs enabled, but these locks are mainly used in softirq context, so disabling BHs is necessary. Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09netfilter: xt_connlimit: fix accouning when receive RST packet in ↵Patrick McHardy
ESTABLISHED state upstream commit: d2ee3f2c4b1db1320c1efb4dcaceeaf6c7e6c2d3 In xt_connlimit match module, the counter of an IP is decreased when the TCP packet is go through the chain with ip_conntrack state TW. Well, it's very natural that the server and client close the socket with FIN packet. But when the client/server close the socket with RST packet(using so_linger), the counter for this connection still exsit. The following patch can fix it which is based on linux-2.6.25.4 Signed-off-by: Dong Wei <dwei.zh@gmail.com> Acked-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09netfilter: nf_conntrack_expect: fix error path unwind in ↵Patrick McHardy
nf_conntrack_expect_init() upstream commit: 12293bf91126ad253a25e2840b307fdc7c2754c3 Signed-off-by: Alexey Dobriyan <adobriyan@parallels.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09netfilter: xt_iprange: module aliases for xt_iprangePhil Oester
upstream commit: 01b7a314291b2ef56ad718ee1374a1bac4768b29 Using iptables 1.3.8 with kernel 2.6.25, rules which include '-m iprange' don't automatically pull in xt_iprange module. Below patch adds module aliases to fix that. Patch against latest -git, but seems like a good candidate for -stable also. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-06asn1: additional sanity checking during BER decoding (CVE-2008-1673)Chris Wright
upstream commit: ddb2c43594f22843e9f3153da151deaba1a834c5 - Don't trust a length which is greater than the working buffer. An invalid length could cause overflow when calculating buffer size for decoding oid. - An oid length of zero is invalid and allows for an off-by-one error when decoding oid because the first subid actually encodes first 2 subids. - A primitive encoding may not have an indefinite length. Thanks to Wei Wang from McAfee for report. Cc: Steven French <sfrench@us.ibm.com> Cc: stable@kernel.org Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-15{nfnetlink, ip, ip6}_queue: fix skb_over_panic when enlarging packetsArnaud Ebalard
[NETFILTER]: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets From: Arnaud Ebalard <arno@natisbad.org> Upstream commit 9a732ed6d: While reinjecting *bigger* modified versions of IPv6 packets using libnetfilter_queue, things work fine on a 2.6.24 kernel (2.6.22 too) but I get the following on recents kernels (2.6.25, trace below is against today's net-2.6 git tree): skb_over_panic: text:c04fddb0 len:696 put:632 head:f7592c00 data:f7592c00 tail:0xf7592eb8 end:0xf7592e80 dev:eth0 ------------[ cut here ]------------ invalid opcode: 0000 [#1] PREEMPT Process sendd (pid: 3657, ti=f6014000 task=f77c31d0 task.ti=f6014000) Stack: c071e638 c04fddb0 000002b8 00000278 f7592c00 f7592c00 f7592eb8 f7592e80 f763c000 f6bc5200 f7592c40 f6015c34 c04cdbfc f6bc5200 00000278 f6015c60 c04fddb0 00000020 f72a10c0 f751b420 00000001 0000000a 000002b8 c065582c Call Trace: [<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0 [<c04cdbfc>] ? skb_put+0x3c/0x40 [<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0 [<c04fd115>] ? nfnetlink_rcv_msg+0xf5/0x160 [<c04fd03e>] ? nfnetlink_rcv_msg+0x1e/0x160 [<c04fd020>] ? nfnetlink_rcv_msg+0x0/0x160 [<c04f8ed7>] ? netlink_rcv_skb+0x77/0xa0 [<c04fcefc>] ? nfnetlink_rcv+0x1c/0x30 [<c04f8c73>] ? netlink_unicast+0x243/0x2b0 [<c04cfaba>] ? memcpy_fromiovec+0x4a/0x70 [<c04f9406>] ? netlink_sendmsg+0x1c6/0x270 [<c04c8244>] ? sock_sendmsg+0xc4/0xf0 [<c011970d>] ? set_next_entity+0x1d/0x50 [<c0133a80>] ? autoremove_wake_function+0x0/0x40 [<c0118f9e>] ? __wake_up_common+0x3e/0x70 [<c0342fbf>] ? n_tty_receive_buf+0x34f/0x1280 [<c011d308>] ? __wake_up+0x68/0x70 [<c02cea47>] ? copy_from_user+0x37/0x70 [<c04cfd7c>] ? verify_iovec+0x2c/0x90 [<c04c837a>] ? sys_sendmsg+0x10a/0x230 [<c011967a>] ? __dequeue_entity+0x2a/0xa0 [<c011970d>] ? set_next_entity+0x1d/0x50 [<c0345397>] ? pty_write+0x47/0x60 [<c033d59b>] ? tty_default_put_char+0x1b/0x20 [<c011d2e9>] ? __wake_up+0x49/0x70 [<c033df99>] ? tty_ldisc_deref+0x39/0x90 [<c033ff20>] ? tty_write+0x1a0/0x1b0 [<c04c93af>] ? sys_socketcall+0x7f/0x260 [<c0102ff9>] ? sysenter_past_esp+0x6a/0x91 [<c05f0000>] ? snd_intel8x0m_probe+0x270/0x6e0 ======================= Code: 00 00 89 5c 24 14 8b 98 9c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 50 89 4c 24 04 c7 04 24 38 e6 71 c0 89 44 24 08 e8 c4 46 c5 ff <0f> 0b eb fe 55 89 e5 56 89 d6 53 89 c3 83 ec 0c 8b 40 50 39 d0 EIP: [<c04ccdfc>] skb_over_panic+0x5c/0x60 SS:ESP 0068:f6015bf8 Looking at the code, I ended up in nfq_mangle() function (called by nfqnl_recv_verdict()) which performs a call to skb_copy_expand() due to the increased size of data passed to the function. AFAICT, it should ask for 'diff' instead of 'diff - skb_tailroom(e->skb)'. Because the resulting sk_buff has not enough space to support the skb_put(skb, diff) call a few lines later, this results in the call to skb_over_panic(). The patch below asks for allocation of a copy with enough space for mangled packet and the same amount of headroom as old sk_buff. While looking at how the regression appeared (e2b58a67), I noticed the same pattern in ipq_mangle_ipv6() and ipq_mangle_ipv4(). The patch corrects those locations too. Tested with bigger reinjected IPv6 packets (nfqnl_mangle() path), things are ok (2.6.25 and today's net-2.6 git tree). Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-15nf_conntrack: padding breaks conntrack hash on ARMPhilip Craig
[NETFILTER]: nf_conntrack: padding breaks conntrack hash on ARM Upstream commit 443a70d50: commit 0794935e "[NETFILTER]: nf_conntrack: optimize hash_conntrack()" results in ARM platforms hashing uninitialised padding. This padding doesn't exist on other architectures. Fix this by replacing NF_CT_TUPLE_U_BLANK() with memset() to ensure everything is initialised. There were only 4 bytes that NF_CT_TUPLE_U_BLANK() wasn't clearing anyway (or 12 bytes on ARM). Signed-off-by: Philip Craig <philipc@snapgear.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-15can: Fix can_send() handling on dev_queue_xmit() failuresOliver Hartkopp
[ Upstream commit: c2ab7ac225e29006b7117d6a9fe8f3be8d98b0c2 ] The tx packet counting and the local loopback of CAN frames should only happen in the case that the CAN frame has been enqueued to the netdevice tx queue successfully. Thanks to Andre Naujoks <nautsch@gmail.com> for reporting this issue. Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Urs Thuermann <urs@isnogud.escape.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-15dccp: return -EINVAL on invalid feature lengthChris Wright
[ Upstream commit: 19443178fbfbf40db15c86012fc37df1a44ab857 ] dccp_feat_change() validates length and on error is returning 1. This happens to work since call chain is checking for 0 == success, but this is returned to userspace, so make it a real error value. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-15ipvs: fix oops in backup for fwmark conn templatesJulian Anastasov
[ Upstream commit: 2ad17defd596ca7e8ba782d5fc6950ee0e99513c ] Fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=10556 where conn templates with protocol=IPPROTO_IP can oops backup box. Result from ip_vs_proto_get() should be checked because protocol value can be invalid or unsupported in backup. But for valid message we should not fail for templates which use IPPROTO_IP. Also, add checks to validate message limits and connection state. Show state NONE for templates using IPPROTO_IP. Fix tested and confirmed by L0op8ack <l0op8ack@hotmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-15sch_htb: remove from event queue in htb_parent_to_leaf()Jarek Poplawski
[ Upstream commit: 3ba08b00e0d8413d79be9cab8ec085ceb6ae6fd6 ] There is lack of removing a class from the event queue while changing from parent to leaf which can cause corruption of this rb tree. This patch fixes a bug introduced by my patch: "sch_htb: turn intermediate classes into leaves" commit: 160d5e10f87b1dc88fd9b84b31b1718e0fd76398. Many thanks to Jan 'yanek' Bortl for finding a way to reproduce this rare bug and narrowing the test case, which made possible proper diagnosing. This patch is recommended for all kernels starting from 2.6.20. Reported-and-tested-by: Jan 'yanek' Bortl <yanek@ya.bofh.cz> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-15XFRM: AUDIT: Fix flowlabel text format ambibuity.YOSHIFUJI Hideaki
[ Upstream commit: 27a27a2158f4fe56a29458449e880a52ddee3dc4 ] Flowlabel text format was not correct and thus ambiguous. For example, 0x00123 or 0x01203 are formatted as 0x123. This is not what audit tools want. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-09sit: Add missing kfree_skb() on pskb_may_pull() failure.David S. Miller
[ Upstream commit: 36ca34cc3b8335eb1fe8bd9a1d0a2592980c3f02 ] Noticed by Paul Marks <paul@pmarks.net>. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01IPSEC: Fix catch-22 with algorithm IDs above 31Herbert Xu
[ Upstream commit: c5d18e984a313adf5a1a4ae69e0b1d93cf410229 ] As it stands it's impossible to use any authentication algorithms with an ID above 31 portably. It just happens to work on x86 but fails miserably on ppc64. The reason is that we're using a bit mask to check the algorithm ID but the mask is only 32 bits wide. After looking at how this is used in the field, I have concluded that in the long term we should phase out state matching by IDs because this is made superfluous by the reqid feature. For current applications, the best solution IMHO is to allow all algorithms when the bit masks are all ~0. The following patch does exactly that. This bug was identified by IBM when testing on the ppc64 platform using the NULL authentication algorithm which has an ID of 251. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01net: Fix wrong interpretation of some copy_to_user() results.Pavel Emelyanov
[ Upstream commit: 653252c2302cdf2dfbca66a7e177f7db783f9efa ] I found some places, that erroneously return the value obtained from the copy_to_user() call: if some amount of bytes were not able to get to the user (this is what this one returns) the proper behavior is to return the -EFAULT error, not that number itself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01rose: Socket lock was not released before returning to user spaceBernard Pidoux
[ Upstream commit: 43837b1e6c5aef803d57009a68db18df13e64892 ] ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ xfbbd/3683 is leaving the kernel with locks still held! 1 lock held by xfbbd/3683: #0: (sk_lock-AF_ROSE){--..}, at: [<c8cd1eb3>] rose_connect+0x73/0x420 [rose] INFO: task xfbbd:3683 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. xfbbd D 00000246 0 3683 3669 c6965ee0 00000092 c02c5c40 00000246 c0f6b5f0 c0f6b5c0 c0f6b5f0 c0f6b5c0 c0f6b614 c6965f18 c024b74b ffffffff c06ba070 00000000 00000000 00000001 c6ab07c0 c012d450 c0f6b634 c0f6b634 c7b5bf10 c0d6004c c7b5bf10 c6965f40 Call Trace: [<c024b74b>] lock_sock_nested+0x6b/0xd0 [<c012d450>] ? autoremove_wake_function+0x0/0x40 [<c02488f1>] sock_fasync+0x41/0x150 [<c0249e69>] sock_close+0x19/0x40 [<c0175d54>] __fput+0xb4/0x170 [<c0176018>] fput+0x18/0x20 [<c017300e>] filp_close+0x3e/0x70 [<c01744e9>] sys_close+0x69/0xb0 [<c0103bda>] sysenter_past_esp+0x5f/0xa5 ======================= INFO: lockdep is turned off. Signed-off-by: Bernard Pidoux <f6bvp@amsat.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01RTNETLINK: Fix bogus ASSERT_RTNL warningPatrick McHardy
[ Upstream commit: c9c1014b2bd014c7ec037bbb6f58818162fdb265 ] ASSERT_RTNL uses mutex_trylock to test whether the rtnl_mutex is held. This bogus warnings when running in atomic context, which f.e. happens when adding secondary unicast addresses through macvlan or vlan or when synchronizing multicast addresses from wireless devices. Mid-term we might want to consider moving all address updates to process context since the locking seems overly complicated, for now just fix the bogus warning by changing ASSERT_RTNL to use mutex_is_locked(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01tcp: tcp_probe buffer overflow and incorrect return valueTom Quetchenbach
[ Upstream commit: 8d390efd903485923419584275fd0c2aa4c94183 ] tcp_probe has a bounds-checking bug that causes many programs (less, python) to crash reading /proc/net/tcp_probe. When it outputs a log line to the reader, it only checks if that line alone will fit in the reader's buffer, rather than that line and all the previous lines it has already written. tcpprobe_read also returns the wrong value if copy_to_user fails--it just passes on the return value of copy_to_user (number of bytes not copied), which makes a failure look like a success. This patch fixes the buffer overflow and sets the return value to -EFAULT if copy_to_user fails. Patch is against latest net-2.6; tested briefly and seems to fix the crashes in less and python. Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>