summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2009-03-16net: Kill skb_truesize_check(), it only catches false-positives.David S. Miller
[ Upstream commit 92a0acce186cde8ead56c6915d9479773673ea1a ] A long time ago we had bugs, primarily in TCP, where we would modify skb->truesize (for TSO queue collapsing) in ways which would corrupt the socket memory accounting. skb_truesize_check() was added in order to try and catch this error more systematically. However this debugging check has morphed into a Frankenstein of sorts and these days it does nothing other than catch false-positives. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16net: amend the fix for SO_BSDCOMPAT gsopt infoleakEugene Teo
[ Upstream commit 50fee1dec5d71b8a14c1b82f2f42e16adc227f8b ] The fix for CVE-2009-0676 (upstream commit df0bca04) is incomplete. Note that the same problem of leaking kernel memory will reappear if someone on some architecture uses struct timeval with some internal padding (for example tv_sec 64-bit and tv_usec 32-bit) --- then, you are going to leak the padded bytes to userspace. Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17net: Fix data corruption when splicing from sockets.Jarek Poplawski
[ Upstream commit 8b9d3728977760f6bd1317c4420890f73695354e ] The trick in socket splicing where we try to convert the skb->data into a page based reference using virt_to_page() does not work so well. The idea is to pass the virt_to_page() reference via the pipe buffer, and refcount the buffer using a SKB reference. But if we are splicing from a socket to a socket (via sendpage) this doesn't work. The from side processing will grab the page (and SKB) references. The sendpage() calls will grab page references only, return, and then the from side processing completes and drops the SKB ref. The page based reference to skb->data is not enough to keep the kmalloc() buffer backing it from being reused. Yet, that is all that the socket send side has at this point. This leads to data corruption if the skb->data buffer is reused by SLAB before the send side socket actually gets the TX packet out to the device. The fix employed here is to simply allocate a page and copy the skb->data bytes into that page. This will hurt performance, but there is no clear way to fix this properly without a copy at the present time, and it is important to get rid of the data corruption. With fixes from Herbert Xu. Tested-by: Willy Tarreau <w@1wt.eu> Foreseen-by: Changli Gao <xiaosuo@gmail.com> Diagnosed-by: Willy Tarreau <w@1wt.eu> Reported-by: Willy Tarreau <w@1wt.eu> Fixed-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17netfilter: xt_sctp: sctp chunk mapping doesn't workQu Haoran
netfilter: xt_sctp: sctp chunk mapping doesn't work Upstream commit: d4e2675a When user tries to map all chunks given in argument, kernel works on a copy of the chunkmap, but at the end it doesn't check the copy, but the orginal one. Signed-off-by: Qu Haoran <haoran.qu@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17netfilter: fix tuple inversion for Node information requestEric Leblond
netfilter: fix tuple inversion for Node information request Upstream commit: a51f42f3c The patch fixes a typo in the inverse mapping of Node Information request. Following draft-ietf-ipngwg-icmp-name-lookups-09, "Querier" sends a type 139 (ICMPV6_NI_QUERY) packet to "Responder" which answer with a type 140 (ICMPV6_NI_REPLY) packet. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17tcp: Fix length tcp_splice_data_recv passes to skb_splice_bits.Dimitris Michailidis
[ Upstream commit 9fa5fdf291c9b58b1cb8b4bb2a0ee57efa21d635 ] tcp_splice_data_recv has two lengths to consider: the len parameter it gets from tcp_read_sock, which specifies the amount of data in the skb, and rd_desc->count, which is the amount of data the splice caller still wants. Currently it passes just the latter to skb_splice_bits, which then splices min(rd_desc->count, skb->len - offset) bytes. Most of the time this is fine, except when the skb contains urgent data. In that case len goes only up to the urgent byte and is less than skb->len - offset. By ignoring len tcp_splice_data_recv may a) splice data tcp_read_sock told it not to, b) return to tcp_read_sock a value > len. Now, tcp_read_sock doesn't handle used > len and leaves the socket in a bad state (both sk_receive_queue and copied_seq are bad at that point) resulting in duplicated data and corruption. Fix by passing min(rd_desc->count, len) to skb_splice_bits. Signed-off-by: Dimitris Michailidis <dm@chelsio.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17tcp: splice as many packets as possible at onceWilly Tarreau
[ Upstream commit 33966dd0e2f68f26943cd9ee93ec6abbc6547a8e ] As spotted by Willy Tarreau, current splice() from tcp socket to pipe is not optimal. It processes at most one segment per call. This results in low performance and very high overhead due to syscall rate when splicing from interfaces which do not support LRO. Willy provided a patch inside tcp_splice_read(), but a better fix is to let tcp_read_sock() process as many segments as possible, so that tcp_rcv_space_adjust() and tcp_cleanup_rbuf() are called less often. With this change, splice() behaves like tcp_recvmsg(), being able to consume many skbs in one system call. With typical 1460 bytes of payload per frame, that means splice(SPLICE_F_NONBLOCK) can return 16*1460 = 23360 bytes. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17net: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2Clément Lecigne
[ Upstream commit df0bca049d01c0ee94afb7cd5dfd959541e6c8da ] In function sock_getsockopt() located in net/core/sock.c, optval v.val is not correctly initialized and directly returned in userland in case we have SO_BSDCOMPAT option set. This dummy code should trigger the bug: int main(void) { unsigned char buf[4] = { 0, 0, 0, 0 }; int len; int sock; sock = socket(33, 2, 2); getsockopt(sock, 1, SO_BSDCOMPAT, &buf, &len); printf("%x%x%x%x\n", buf[0], buf[1], buf[2], buf[3]); close(sock); } Here is a patch that fix this bug by initalizing v.val just after its declaration. Signed-off-by: Clément Lecigne <clement.lecigne@netasq.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17ipv6: Copy cork options in ip6_append_dataHerbert Xu
[ Upstream commit 0178b695fd6b40a62a215cbeb03dd51ada3bb5e0 ] As the options passed to ip6_append_data may be ephemeral, we need to duplicate it for corking. This patch applies the simplest fix which is to memdup all the relevant bits. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17ipv6: Disallow rediculious flowlabel option sizes.David S. Miller
[ Upstream commit 684de409acff8b1fe8bf188d75ff2f99c624387d ] Just like PKTINFO, limit the options area to 64K. Based upon report by Eric Sesterhenn and analysis by Roland Dreier. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17udp: increments sk_drops in __udp_queue_rcv_skb()Eric Dumazet
[ Upstream commit e408b8dcb5ce42243a902205005208e590f28454 ] Commit 93821778def10ec1e69aa3ac10adee975dad4ff3 (udp: Fix rcv socket locking) accidentally removed sk_drops increments for UDP IPV4 sockets. This field can be used to detect incorrect sizing of socket receive buffers. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17udp: Fix UDP short packet false positiveJesper Dangaard Brouer
[ Upstream commit 7b5e56f9d635643ad54f2f42e69ad16b80a2cff1 ] The UDP header pointer assignment must happen after calling pskb_may_pull(). As pskb_may_pull() can potentially alter the SKB buffer. This was exposted by running multicast traffic through the NIU driver, as it won't prepull the protocol headers into the linear area on receive. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17packet: Avoid lock_sock in mmap handlerHerbert Xu
[ Upstream commit 905db44087855e3c1709f538ecdc22fd149cadd8 ] As the mmap handler gets called under mmap_sem, and we may grab mmap_sem elsewhere under the socket lock to access user data, we should avoid grabbing the socket lock in the mmap handler. Since the only thing we care about in the mmap handler is for pg_vec* to be invariant, i.e., to exclude packet_set_ring, we can achieve this by simply using a new mutex. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Martin MOKREJŠ <mmokrejs@ribosome.natur.cuni.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17net: packet socket packet_lookup_frame fixSebastiano Di Paola
[ Upstream commit f9e6934502e46c363100245f137ddf0f4b1cb574 ] packet_lookup_frames() fails to get user frame if current frame header status contains extra flags. This is due to the wrong assumption on the operators precedence during frame status tests. Fixed by forcing the right operators precedence order with explicit brackets. Signed-off-by: Paolo Abeni <paolo.abeni@gmail.com> Signed-off-by: Sebastiano Di Paola <sebastiano.dipaola@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17ipv4: fix infinite retry loop in IP-ConfigBenjamin Zores
[ Upstream commit 9d8dba6c979fa99c96938c869611b9a23b73efa9 ] Signed-off-by: Benjamin Zores <benjamin.zores@alcatel-lucent.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17net: Fix OOPS in skb_seq_read().Shyam Iyer
[ Upstream commit 71b3346d182355f19509fadb8fe45114a35cc499 ] It oopsd for me in skb_seq_read. addr2line said it was linux-2.6/net/core/skbuff.c:2228, which is this line: while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) { I added some printks in there and it looks like we hit this: } else if (st->root_skb == st->cur_skb && skb_shinfo(st->root_skb)->frag_list) { st->cur_skb = skb_shinfo(st->root_skb)->frag_list; st->frag_idx = 0; goto next_skb; } Actually I did some testing and added a few printks and found that the st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null. This caused the kernel panic. if (abs_offset < block_limit) { - *data = st->cur_skb->data + abs_offset; + *data = st->cur_skb->data + (abs_offset - st->stepped_offset); I enabled the debug_tcp and with a few printks found that the code did not go to the next_skb label and could find that the sequence being followed was this - It hit this if condition - if (st->cur_skb->next) { st->cur_skb = st->cur_skb->next; st->frag_idx = 0; goto next_skb; And so, now the st pointer is shifted to the next skb whereas actually it should have hit the second else if first since the data is in the frag_list. else if (st->root_skb == st->cur_skb && skb_shinfo(st->root_skb)->frag_list) { st->cur_skb = skb_shinfo(st->root_skb)->frag_list; goto next_skb; } Reversing the two conditions the attached patch fixes the issue for me on top of Herbert's patches. Signed-off-by: Shyam Iyer <shyam_iyer@dell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17net: Fix frag_list handling in skb_seq_readHerbert Xu
[ Upstream commit 95e3b24cfb4ec0479d2c42f7a1780d68063a542a ] The frag_list handling was broken in skb_seq_read: 1) We didn't add the stepped offset when looking at the head are of fragments other than the first. 2) We didn't take the stepped offset away when setting the data pointer in the head area. 3) The frag index wasn't reset. This patch fixes both issues. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17sctp: Properly timestamp outgoing data chunks for rtx purposesVlad Yasevich
[ Upstream commit 759af00ebef858015eb68876ac1f383bcb6a1774 ] Recent changes to the retransmit code exposed a long standing bug where it was possible for a chunk to be time stamped after the retransmit timer was reset. This caused a rare situation where the retrnamist timer has expired, but nothing was marked for retrnasmission because all of timesamps on data were less then 1 rto ago. As result, the timer was never restarted since nothing was retransmitted, and this resulted in a hung association that did couldn't complete the data transfer. The solution is to timestamp the chunk when it's added to the packet for transmission purposes. After the packet is trsnmitted the rtx timer is restarted. This guarantees that when the timer expires, there will be data to retransmit. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17sctp: Correctly start rtx timer on new packet transmissions.Vlad Yasevich
[ Upstream commit 6574df9a89f9f7da3a4e5cee7633d430319d3350 ] Commit 62aeaff5ccd96462b7077046357a6d7886175a57 (sctp: Start T3-RTX timer when fast retransmitting lowest TSN) introduced a regression where it was possible to forcibly restart the sctp retransmit timer at the transmission of any new chunk. This resulted in much longer timeout times and sometimes hung sctp connections. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-17mac80211: restrict to AP in outgoing interface heuristicJohannes Berg
commit f1b33cb1c25ac476cbf22783f9ca2016f99648ed upstream. We try to find the correct outgoing interface for injected frames based on the TA, but since this is a hack for hostapd 11w, restrict the heuristic to AP mode interfaces. At some point we'll add the ability to give an interface index in radiotap or so and just remove this heuristic again. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-12sctp: Fix another socket race during accept/peeloffVlad Yasevich
commit ae53b5bd77719fed58086c5be60ce4f22bffe1c6 upstream. There is a race between sctp_rcv() and sctp_accept() where we have moved the association from the listening socket to the accepted socket, but sctp_rcv() processing cached the old socket and continues to use it. The easy solution is to check for the socket mismatch once we've grabed the socket lock. If we hit a mis-match, that means that were are currently holding the lock on the listening socket, but the association is refrencing a newly accepted socket. We need to drop the lock on the old socket and grab the lock on the new one. A more proper solution might be to create accepted sockets when the new association is established, similar to TCP. That would eliminate the race for 1-to-1 style sockets, but it would still existing for 1-to-many sockets where a user wished to peeloff an association. For now, we'll live with this easy solution as it addresses the problem. Reported-by: Michal Hocko <mhocko@suse.cz> Reported-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-06minstrel: fix warning if lowest supported rate index is not 0Christian Lamparter
commit d57854bb1d78ba89ffbfdfd1c3e95b52ed7478ff upstream This patch fixes the following WARNING (caused by rix_to_ndx): " >WARNING: at net/mac80211/rc80211_minstrel.c:69 minstrel_rate_init+0xd2/0x33a [mac80211]() >[...] >Call Trace: > warn_on_slowpath+0x51/0x75 > _format_mac_addr+0x4c/0x88 > minstrel_rate_init+0xd2/0x33a [mac80211] > print_mac+0x16/0x1b > schedule_hrtimeout_range+0xdc/0x107 > ieee80211_add_station+0x158/0x1bd [mac80211] > nl80211_new_station+0x1b3/0x20b [cfg80211] The reason is that I'm experimenting with "g" only mode on a 802.11 b/g card. Therefore rate_lowest_index returns 4 (= 6Mbit, instead of usual 0 = 1Mbit). Since mi->r array is initialized with zeros in minstrel_alloc_sta, rix_to_ndx has a hard time to find the 6Mbit entry and will trigged the WARNING. Signed-off-by: Christian Lamparter <chunkeey@web.de> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-06netfilter: ctnetlink: fix scheduling while atomicPatrick McHardy
commit 748085fcbedbf7b0f38d95e178265d7b13360b44 upstream. Caused by call to request_module() while holding nf_conntrack_lock. Reported-and-tested-by: Kövesdi György <kgy@teledigit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-02NET: net_namespace, fix lock imbalanceJiri Slaby
commit 357f5b0b91054ae23385ea4b0634bb8b43736e83 upstream. register_pernet_gen_subsys omits mutex_unlock in one fail path. Fix it. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-02mac80211: decrement ref count to netdev after launching mesh discoveryBrian Cavagnolo
commit 5dc306f3bd1d4cfdf79df39221b3036eab1ddcf3 upstream. After launching mesh discovery in tx path, reference count was not being decremented. This was preventing module unload. Signed-off-by: Brian Cavagnolo <brian@cozybit.com> Signed-off-by: Andrey Yurovsky <andrey@cozybit.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24netfilter: nf_conntrack: fix ICMP/ICMPv6 timeout sysctls on big-endianPatrick McHardy
Upstream commit 71320af: An old bug crept back into the ICMP/ICMPv6 conntrack protocols: the timeout values are defined as unsigned longs, the sysctl's maxsize is set to sizeof(unsigned int). Use unsigned int for the timeout values as in the other conntrack protocols. Reported-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24netfilter: ebtables: fix inversion in match codePatrick McHardy
Upstream commit d61ba9f: Commit 8cc784ee (netfilter: change return types of match functions for ebtables extensions) broke ebtables matches by inverting the sense of match/nomatch. Reported-by: Matt Cross <matthltc@us.ibm.com> Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24netfilter: x_tables: fix match/target revision lookupPatrick McHardy
Upstream commit 656caff: Commit 55b69e91 (netfilter: implement NFPROTO_UNSPEC as a wildcard for extensions) broke revision probing for matches and targets that are registered with NFPROTO_UNSPEC. Fix by continuing the search on the NFPROTO_UNSPEC list if nothing is found on the af-specific lists. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24tcp: don't mask EOF and socket errors on nonblocking splice receiveLennert Buytenhek
[ Upstream commit: 4f7d54f59bc470f0aaa932f747a95232d7ebf8b1 ] Currently, setting SPLICE_F_NONBLOCK on splice from a TCP socket results in masking of EOF (RDHUP) and error conditions on the socket by an -EAGAIN return. Move the NONBLOCK check in tcp_splice_read() to be after the EOF and error checks to fix this. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24pkt_sched: cls_u32: Fix locking in u32_change()Jarek Poplawski
[ Upstream commit: 6f57321422e0d359e83c978c2b03db77b967b7d5 ] New nodes are inserted in u32_change() under rtnl_lock() with wmb(), so without tcf_tree_lock() like in other classifiers (e.g. cls_fw). This isn't enough without rmb() on the read side, but on the other hand adding such barriers doesn't give any savings, so the lock is added instead. Reported-by: m0sia <m0sia@plotinka.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24sctp: Avoid memory overflow while FWD-TSN chunk is received with bad stream IDWei Yongjun
[ Upstream commit: 9fcb95a105758b81ef0131cd18e2db5149f13e95 ] If FWD-TSN chunk is received with bad stream ID, the sctp will not do the validity check, this may cause memory overflow when overwrite the TSN of the stream ID. The FORWARD-TSN chunk is like this: FORWARD-TSN chunk Type = 192 Flags = 0 Length = 172 NewTSN = 99 Stream = 10000 StreamSequence = 0xFFFF This patch fix this problem by discard the chunk if stream ID is not less than MIS. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24ipv6: Fix fib6_dump_table walker leakHerbert Xu
[ Upstream commit: 7891cc818967e186be68caac32d84bfd0a3f0bd2 ] When a fib6 table dump is prematurely ended, we won't unlink its walker from the list. This causes all sorts of grief for other users of the list later. Reported-by: Chris Caputo <ccaputo@alt.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-24pkt_sched: sch_htb: Fix deadlock in hrtimers triggered by HTBJarek Poplawski
[ Upstream commit: none This is a quick fix for -stable purposes. Upstream fixes these problems via a large set of invasive hrtimer changes. ] Most probably there is a (still unproven) race in hrtimers (before 2.6.29 kernels), which causes a corruption of hrtimers rbtree. This patch doesn't fix it, but should let HTB avoid triggering the bug. Reported-by: Denys Fedoryschenko <denys@visp.net.lb> Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru> Reported-by: Chris Caputo <ccaputo@alt.net> Tested-by: Badalian Vyacheslav <slavon@bigtelecom.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 22Heiko Carstens
commit 3e0fa65f8ba4fd24b3dcfaf14d5b15eaab0fdc61 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 21Heiko Carstens
commit 20f37034fb966a1c35894f9fe529fda0b6440101 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-18System call wrappers part 07Heiko Carstens
commit 754fe8d297bfae7b77f7ce866e2fb0c5fb186506 upstream. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-12-18net: Fix module refcount leak in kernel_accept()Wei Yongjun
The kernel_accept() does not hold the module refcount of newsock->ops->owner, so we need __module_get(newsock->ops->owner) code after call kernel_accept() by hand. In sunrpc, the module refcount is missing to hold. So this cause kernel panic. Used following script to reproduct: while [ 1 ]; do mount -t nfs4 192.168.0.19:/ /mnt touch /mnt/file umount /mnt lsmod | grep ipv6 done This patch fixed the problem by add __module_get(newsock->ops->owner) to kernel_accept(). So we do not need to used __module_get(newsock->ops->owner) in every place when used kernel_accept(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: Phonet: keep TX queue disabled when the device is off SCHED: netem: Correct documentation comment in code. netfilter: update rwlock initialization for nat_table netlabel: Compiler warning and NULL pointer dereference fix e1000e: fix double release of mutex IA64: HP_SIMETH needs to depend upon NET netpoll: fix race on poll_list resulting in garbage entry ipv6: silence log messages for locally generated multicast sungem: improve ethtool output with internal pcs and serdes tcp: tcp_vegas cong avoid fix sungem: Make PCS PHY support partially work again.
2008-12-15Phonet: keep TX queue disabled when the device is offRémi Denis-Courmont
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15SCHED: netem: Correct documentation comment in code.Jesper Dangaard Brouer
The netem simulator is no longer limited by Linux timer resolution HZ. Not since Patrick McHardy changed the QoS system to use hrtimer. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-15netfilter: update rwlock initialization for nat_tableSteven Rostedt
The commit e099a173573ce1ba171092aee7bb3c72ea686e59 (netfilter: netns nat: per-netns NAT table) renamed the nat_table from __nat_table to nat_table without updating the __RW_LOCK_UNLOCKED(__nat_table.lock). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-11netlabel: Compiler warning and NULL pointer dereference fixPaul Moore
Fix the two compiler warnings show below. Thanks to Geert Uytterhoeven for finding and reporting the problem. net/netlabel/netlabel_unlabeled.c:567: warning: 'entry' may be used uninitialized in this function net/netlabel/netlabel_unlabeled.c:629: warning: 'entry' may be used uninitialized in this function Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-09netpoll: fix race on poll_list resulting in garbage entryNeil Horman
A few months back a race was discused between the netpoll napi service path, and the fast path through net_rx_action: http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470 A patch was submitted for that bug, but I think we missed a case. Consider the following scenario: INITIAL STATE CPU0 has one napi_struct A on its poll_list CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same napi_struct A that CPU0 has on its list CPU0 CPU1 net_rx_action poll_napi !list_empty (returns true) locks poll_lock for A poll_one_napi napi->poll netif_rx_complete __napi_complete (removes A from poll_list) list_entry(list->next) In the above scenario, net_rx_action assumes that the per-cpu poll_list is exclusive to that cpu. netpoll of course violates that, and because the netpoll path can dequeue from the poll list, its possible for CPU0 to detect a non-empty list at the top of the while loop in net_rx_action, but have it become empty by the time it calls list_entry. Since the poll_list isn't surrounded by any other structure, the returned data from that list_entry call in this situation is garbage, and any number of crashes can result based on what exactly that garbage is. Given that its not fasible for performance reasons to place exclusive locks arround each cpus poll list to provide that mutal exclusion, I think the best solution is modify the netpoll path in such a way that we continue to guarantee that the poll_list for a cpu is in fact exclusive to that cpu. To do this I've implemented the patch below. It adds an additional bit to the state field in the napi_struct. When executing napi->poll from the netpoll_path, this bit will be set. When a driver calls netif_rx_complete, if that bit is set, it will not remove the napi_struct from the poll_list. That work will be saved for the next iteration of net_rx_action. I've tested this and it seems to work well. About the biggest drawback I can see to it is the fact that it might result in an extra loop through net_rx_action in the event that the device is actually contended for (i.e. the netpoll path actually preforms all the needed work no the device, and the call to net_rx_action winds up doing nothing, except removing the napi_struct from the poll_list. However I think this is probably a small price to pay, given that the alternative is a crash. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-09ipv6: silence log messages for locally generated multicastJan Sembera
This patch fixes minor annoyance during transmission of unsolicited neighbor advertisements from userspace to multicast addresses (as far as I can see in RFC, this is allowed and the similar functionality for IPv4 has been in arping for a long time). Outgoing multicast packets get reinserted into local processing as if they are received from the network. The machine thus sees its own NA and fills the logs with error messages. This patch removes the message if NA has been generated locally. Signed-off-by: Jan Sembera <jsembera@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-09tcp: tcp_vegas cong avoid fix Doug Leith
This patch addresses a book-keeping issue in tcp_vegas.c. At present tcp_vegas does separate book-keeping of cwnd based on packet sequence numbers. A mismatch can develop between this book-keeping and tp->snd_cwnd due, for example, to delayed acks acking multiple packets. When vegas transitions to reno operation (e.g. following loss), then this mismatch leads to incorrect behaviour (akin to a cwnd backoff). This seems mostly to affect operation at low cwnds where delayed acking can lead to a significant fraction of cwnd being covered by a single ack, leading to the book-keeping mismatch. This patch modifies the congestion avoidance update to avoid the need for separate book-keeping while leaving vegas congestion avoidance functionally unchanged. A secondary advantage of this modification is that the use of fixed-point (via V_PARAM_SHIFT) and 64 bit arithmetic is no longer necessary, simplifying the code. Some example test measurements with the patched code (confirming no functional change in the congestion avoidance algorithm) can be seen at: http://www.hamilton.ie/doug/vegaspatch/ Signed-off-by: Doug Leith <doug.leith@nuim.ie> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: tproxy: fixe a possible read from an invalid location in the socket match zd1211rw: use unaligned safe memcmp() in-place of compare_ether_addr() mac80211: use unaligned safe memcmp() in-place of compare_ether_addr() ipw2200: fix netif_*_queue() removal regression iwlwifi: clean key table in iwl_clear_stations_table function tcp: tcp_vegas ssthresh bug fix can: omit received RTR frames for single ID filter lists ATM: CVE-2008-5079: duplicate listen() on socket corrupts the vcc table netx-eth: initialize per device spinlock tcp: make urg+gso work for real this time enc28j60: Fix sporadic packet loss (corrected again) hysdn: fix writing outside the field on 64 bits b1isa: fix b1isa_exit() to really remove registered capi controllers can: Fix CAN_(EFF|RTR)_FLAG handling in can_filter Phonet: do not dump addresses from other namespaces netlabel: Fix a potential NULL pointer dereference bnx2: Add workaround to handle missed MSI. xfrm: Fix kernel panic when flush and dump SPD entries
2008-12-07tproxy: fixe a possible read from an invalid location in the socket matchBalazs Scheidler
TIME_WAIT sockets need to be handled specially, and the socket match casted inet_timewait_sock instances to inet_sock, which are not compatible. Handle this special case by checking sk->sk_state. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-05Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2008-12-05mac80211: use unaligned safe memcmp() in-place of compare_ether_addr()Shaddy Baddah
After fixing zd1211rw: use unaligned safe memcmp() in-place of compare_ether_addr(), I started to see kernel log messages detailing unaligned access: Kernel unaligned access at TPC[100f7f44] sta_info_get+0x24/0x68 [mac80211] As with the aforementioned patch, the unaligned access was eminating from a compare_ether_addr() call. Concerned that whilst it was safe to assume that unalignment was the norm for the zd1211rw, and take preventative measures, it may not be the case or acceptable to use the easy fix of changing the call to memcmp(). My research however indicated that it was OK to do this, as there are a few instances where memcmp() is the preferred mechanism for doing mac address comparisons throughout the module. Signed-off-by: Shaddy Baddah <shaddy_baddah@hotmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-12-04tcp: tcp_vegas ssthresh bug fixDoug Leith
This patch fixes a bug in tcp_vegas.c. At the moment this code leaves ssthresh untouched. However, this means that the vegas congestion control algorithm is effectively unable to reduce cwnd below the ssthresh value (if the vegas update lowers the cwnd below ssthresh, then slow start is activated to raise it back up). One example where this matters is when during slow start cwnd overshoots the link capacity and a flow then exits slow start with ssthresh set to a value above where congestion avoidance would like to adjust it. Signed-off-by: Doug Leith <doug.leith@nuim.ie> Signed-off-by: David S. Miller <davem@davemloft.net>