summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2012-03-19IPv6: Fix not join all-router mcast group when forwarding set.Li Wei
[ Upstream commit d6ddef9e641d1229d4ec841dc75ae703171c3e92 ] When forwarding was set and a new net device is register, we need add this device to the all-router mcast group. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_unaNeal Cardwell
[ Upstream commit 4648dc97af9d496218a05353b0e442b3dfa6aaab ] This commit fixes tcp_shift_skb_data() so that it does not shift SACKed data below snd_una. This fixes an issue whose symptoms exactly match reports showing tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at net/ipv4/tcp_input.c:3418" thread on netdev). Since 2008 (832d11c5cd076abc0aa1eaf7be96c81d1a59ce41) tcp_shift_skb_data() had been shifting SACKed ranges that were below snd_una. It checked that the *end* of the skb it was about to shift from was above snd_una, but did not check that the end of the actual shifted range was above snd_una; this commit adds that check. Shifting SACKed ranges below snd_una is problematic because for such ranges tcp_sacktag_one() short-circuits: it does not declare anything as SACKed and does not increase sacked_out. Before the fixes in commits cc9a672ee522d4805495b98680f4a3db5d0a0af9 and daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, shifting SACKed ranges below snd_una happened to work because tcp_shifted_skb() was always (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence tcp_sacktag_one() never short-circuited and always increased tp->sacked_out in this case. After those two fixes, my testing has verified that shifting SACKed ranges below snd_una could cause tp->sacked_out to go negative with the following sequence of events: (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una, then shifts a prefix of that skb that is below snd_una (2) tcp_shifted_skb() increments the packet count of the already-SACKed prev sk_buff (3) tcp_sacktag_one() sees the end of the new SACKed range is below snd_una, so it short-circuits and doesn't increase tp->sacked_out (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed, decrements tp->sacked_out by this "inflated" pcount that was missing a matching increase in tp->sacked_out, and hence tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted to s32 is negative. (6) this leads to the warnings seen in the recent "WARNING: at net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.: tcp_input.c:3418 WARN_ON((int)tp->sacked_out < 0); More generally, I think this bug can be tickled in some cases where two or more ACKs from the receiver are lost and then a DSACK arrives that is immediately above an existing SACKed skb in the write queue. This fix changes tcp_shift_skb_data() to abort this sequence at step (1) in the scenario above by noticing that the bytes are below snd_una and not shifting them. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19bridge: check return value of ipv6_dev_get_saddr()Ulrich Weber
[ Upstream commit d1d81d4c3dd886d5fa25a2c4fa1e39cb89613712 ] otherwise source IPv6 address of ICMPV6_MGM_QUERY packet might be random junk if IPv6 is disabled on interface or link-local address is not yet ready (DAD). Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19tcp: don't fragment SACKed skbs in tcp_mark_head_lost()Neal Cardwell
[ Upstream commit c0638c247f559e1a16ee79e54df14bca2cb679ea ] In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb to mark the first portion as lost. This is for two primary reasons: (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When doing this, it preserves the sum of their packet counts in order to reflect the real-world dynamics on the wire. But given that skbs can have remainders that do not align to MSS boundaries, this packet count preservation means that for SACKed skbs there is not necessarily a direct linear relationship between tcp_skb_pcount(skb) and skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment off and mark as lost a prefix of length (packets - oldcnt)*mss from SACKed skbs were leading to occasional failures of the WARN_ON(len > skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the recent "crash in tcp_fragment" thread on netdev). (2) there is no real point in fragmenting off part of a SACKed skb and calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP for SACKed skbs. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19tcp: fix false reordering signal in tcp_shifted_skbNeal Cardwell
[ Upstream commit 4c90d3b30334833450ccbb02f452d4972a3c3c3f ] When tcp_shifted_skb() shifts bytes from the skb that is currently pointed to by 'highest_sack' then the increment of TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This implicit advancement, combined with the recent fix to pass the correct SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think that the newly SACKed range was before the tcp_highest_sack_seq(), leading to a call to tcp_update_reordering() with a degree of reordering matching the size of the newly SACKed range (typically just 1 packet, which is a NOP, but potentially larger). This commit fixes this by simply calling tcp_sacktag_one() before the TCP_SKB_CB(skb)->seq advancement that can advance our notion of the highest SACKed sequence. Correspondingly, we can simplify the code a little now that tcp_shifted_skb() should update the lost_cnt_hint in all cases where skb == tp->lost_skb_hint. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19ipsec: be careful of non existing mac headersEric Dumazet
[ Upstream commit 03606895cd98c0a628b17324fd7b5ff15db7e3cd ] Niccolo Belli reported ipsec crashes in case we handle a frame without mac header (atm in his case) Before copying mac header, better make sure it is present. Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809 Reported-by: Niccolò Belli <darkbasic@linuxsystems.it> Tested-by: Niccolò Belli <darkbasic@linuxsystems.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19neighbour: Fixed race condition at tbl->nhtMichel Machado
[ Upstream commit 84338a6c9dbb6ff3de4749864020f8f25d86fc81 ] When the fixed race condition happens: 1. While function neigh_periodic_work scans the neighbor hash table pointed by field tbl->nht, it unlocks and locks tbl->lock between buckets in order to call cond_resched. 2. Assume that function neigh_periodic_work calls cond_resched, that is, the lock tbl->lock is available, and function neigh_hash_grow runs. 3. Once function neigh_hash_grow finishes, and RCU calls neigh_hash_free_rcu, the original struct neigh_hash_table that function neigh_periodic_work was using doesn't exist anymore. 4. Once back at neigh_periodic_work, whenever the old struct neigh_hash_table is accessed, things can go badly. Signed-off-by: Michel Machado <michel@digirati.com.br> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12mac80211: zero initialize count field in ieee80211_tx_rateMohammed Shafi Shajakhan
commit 8617b093d0031837a7be9b32bc674580cfb5f6b5 upstream. rate control algorithms concludes the rate as invalid with rate[i].idx < -1 , while they do also check for rate[i].count is non-zero. it would be safer to zero initialize the 'count' field. recently we had a ath9k rate control crash where the ath9k rate control in ath_tx_status assumed to check only for rate[i].count being non-zero in one instance and ended up in using invalid rate index for 'connection monitoring NULL func frames' which eventually lead to the crash. thanks to Pavel Roskin for fixing it and finding the root cause. https://bugzilla.redhat.com/show_bug.cgi?id=768639 Cc: Pavel Roskin <proski@gnu.org> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29ipvs: fix matching of fwmark templates during schedulingSimon Horman
commit e0aac52e17a3db68fe2ceae281780a70fc69957f upstream. Commit f11017ec2d1859c661f4e2b12c4a8d250e1f47cf (2.6.37) moved the fwmark variable in subcontext that is invalidated before reaching the ip_vs_ct_in_get call. As vaddr is provided as pointer in the param structure make sure the fwmark variable is in same context. As the fwmark templates can not be matched, more and more template connections are created and the controlled connections can not go to single real server. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29tcp: fix tcp_shifted_skb() adjustment of lost_cnt_hint for FACKNeal Cardwell
[ Upstream commit 0af2a0d0576205dda778d25c6c344fc6508fc81d ] This commit ensures that lost_cnt_hint is correctly updated in tcp_shifted_skb() for FACK TCP senders. The lost_cnt_hint adjustment in tcp_sacktag_one() only applies to non-FACK senders, so FACK senders need their own adjustment. This applies the spirit of 1e5289e121372a3494402b1b131b41bfe1cf9b7f - except now that the sequence range passed into tcp_sacktag_one() is correct we need only have a special case adjustment for FACK. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29tcp: fix range tcp_shifted_skb() passes to tcp_sacktag_one()Neal Cardwell
[ Upstream commit daef52bab1fd26e24e8e9578f8fb33ba1d0cb412 ] Fix the newly-SACKed range to be the range of newly-shifted bytes. Previously - since 832d11c5cd076abc0aa1eaf7be96c81d1a59ce41 - tcp_shifted_skb() incorrectly called tcp_sacktag_one() with the start and end sequence numbers of the skb it passes in set to the range just beyond the range that is newly-SACKed. This commit also removes a special-case adjustment to lost_cnt_hint in tcp_shifted_skb() since the pre-existing adjustment of lost_cnt_hint in tcp_sacktag_one() now properly handles this things now that the correct start sequence number is passed in. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29tcp: allow tcp_sacktag_one() to tag ranges not aligned with skbsNeal Cardwell
[ Upstream commit cc9a672ee522d4805495b98680f4a3db5d0a0af9 ] This commit allows callers of tcp_sacktag_one() to pass in sequence ranges that do not align with skb boundaries, as tcp_shifted_skb() needs to do in an upcoming fix in this patch series. In fact, now tcp_sacktag_one() does not need to depend on an input skb at all, which makes its semantics and dependencies more clear. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29gro: more generic L2 header checkEric Dumazet
[ Upstream commit 5ca3b72c5da47d95b83857b768def6172fbc080a ] Shlomo Pongratz reported GRO L2 header check was suited for Ethernet only, and failed on IB/ipoib traffic. He provided a patch faking a zeroed header to let GRO aggregates frames. Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header check to be more generic, ie not assuming L2 header is 14 bytes, but taking into account hard_header_len. __napi_gro_receive() has special handling for the common case (Ethernet) to avoid a memcmp() call and use an inline optimized function instead. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Shlomo Pongratz <shlomop@mellanox.com> Cc: Roland Dreier <roland@kernel.org> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29net: Make qdisc_skb_cb upper size bound explicit.David S. Miller
[ Upstream commit 16bda13d90c8d5da243e2cfa1677e62ecce26860 ] Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside of other data structures. This is intended to be used by IPoIB so that it can remember addressing information stored at hard_header_ops->create() time that it can fetch when the packet gets to the transmit routine. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.Li Wei
[ Upstream commit 5dc7883f2a7c25f8df40d7479687153558cd531b ] This patch fix a bug which introduced by commit ac8a4810 (ipv4: Save nexthop address of LSRR/SSRR option to IPCB.).In that patch, we saved the nexthop of SRR in ip_option->nexthop and update iph->daddr until we get to ip_forward_options(), but we need to update it before ip_rt_get_source(), otherwise we may get a wrong src. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29tcp_v4_send_reset: binding oif to iif in no sock caseShawn Lu
[ Upstream commit e2446eaab5585555a38ea0df4e01ff313dbb4ac9 ] Binding RST packet outgoing interface to incoming interface for tcp v4 when there is no socket associate with it. when sk is not NULL, using sk->sk_bound_dev_if instead. (suggested by Eric Dumazet). This has few benefits: 1. tcp_v6_send_reset already did that. 2. This helps tcp connect with SO_BINDTODEVICE set. When connection is lost, we still able to sending out RST using same interface. 3. we are sending reply, it is most likely to be succeed if iif is used Signed-off-by: Shawn Lu <shawn.lu@ericsson.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29net_sched: Bug in netem reorderingHagen Paul Pfeifer
[ Upstream commit eb10192447370f19a215a8c2749332afa1199d46 ] Not now, but it looks you are correct. q->qdisc is NULL until another additional qdisc is attached (beside tfifo). See 50612537e9ab2969312. The following patch should work. From: Hagen Paul Pfeifer <hagen@jauu.net> netem: catch NULL pointer by updating the real qdisc statistic Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29netpoll: netpoll_poll_dev() should access dev->flagsEric Dumazet
[ Upstream commit 58e05f357a039a94aa36475f8c110256f693a239 ] commit 5a698af53f (bond: service netpoll arp queue on master device) tested IFF_SLAVE flag against dev->priv_flags instead of dev->flags Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: WANG Cong <amwang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabledThomas Graf
[ Upstream commit 70620c46ac2b45c24b0f22002fdf5ddd1f7daf81 ] Commit 653241 (net: RFC3069, private VLAN proxy arp support) changed the behavior of arp proxy to send arp replies back out on the interface the request came in even if the private VLAN feature is disabled. Previously we checked rt->dst.dev != skb->dev for in scenarios, when proxy arp is enabled on for the netdevice and also when individual proxy neighbour entries have been added. This patch adds the check back for the pneigh_lookup() scenario. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29mac80211: Fix a rwlock bad magic bugMohammed Shafi Shajakhan
commit b57e6b560fc2a2742910ac5ca0eb2c46e45aeac2 upstream. read_lock(&tpt_trig->trig.leddev_list_lock) is accessed via the path ieee80211_open (->) ieee80211_do_open (->) ieee80211_mod_tpt_led_trig (->) ieee80211_start_tpt_led_trig (->) tpt_trig_timer before initializing it. the intilization of this read/write lock happens via the path ieee80211_led_init (->) led_trigger_register, but we are doing 'ieee80211_led_init' after 'ieeee80211_if_add' where we register netdev_ops. so we access leddev_list_lock before initializing it and causes the following bug in chrome laptops with AR928X cards with the following script while true do sudo modprobe -v ath9k sleep 3 sudo modprobe -r ath9k sleep 3 done BUG: rwlock bad magic on CPU#1, wpa_supplicant/358, f5b9eccc Pid: 358, comm: wpa_supplicant Not tainted 3.0.13 #1 Call Trace: [<8137b9df>] rwlock_bug+0x3d/0x47 [<81179830>] do_raw_read_lock+0x19/0x29 [<8137f063>] _raw_read_lock+0xd/0xf [<f9081957>] tpt_trig_timer+0xc3/0x145 [mac80211] [<f9081f3a>] ieee80211_mod_tpt_led_trig+0x152/0x174 [mac80211] [<f9076a3f>] ieee80211_do_open+0x11e/0x42e [mac80211] [<f9075390>] ? ieee80211_check_concurrent_iface+0x26/0x13c [mac80211] [<f9076d97>] ieee80211_open+0x48/0x4c [mac80211] [<812dbed8>] __dev_open+0x82/0xab [<812dc0c9>] __dev_change_flags+0x9c/0x113 [<812dc1ae>] dev_change_flags+0x18/0x44 [<8132144f>] devinet_ioctl+0x243/0x51a [<81321ba9>] inet_ioctl+0x93/0xac [<812cc951>] sock_ioctl+0x1c6/0x1ea [<812cc78b>] ? might_fault+0x20/0x20 [<810b1ebb>] do_vfs_ioctl+0x46e/0x4a2 [<810a6ebb>] ? fget_light+0x2f/0x70 [<812ce549>] ? sys_recvmsg+0x3e/0x48 [<810b1f35>] sys_ioctl+0x46/0x69 [<8137fa77>] sysenter_do_call+0x12/0x2 Cc: Gary Morain <gmorain@google.com> Cc: Paul Stewart <pstew@google.com> Cc: Abhijit Pradhan <abhijit@qca.qualcomm.com> Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com> Cc: Rajkumar Manoharan <rmanohar@qca.qualcomm.com> Acked-by: Johannes Berg <johannes.berg@intel.com> Tested-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20mac80211: timeout a single frame in the rx reorder bufferEliad Peller
commit 07ae2dfcf4f7143ce191c6436da1c33f179af0d6 upstream. The current code checks for stored_mpdu_num > 1, causing the reorder_timer to be triggered indefinitely, but the frame is never timed-out (until the next packet is received) Signed-off-by: Eliad Peller <eliad@wizery.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03tcp: md5: using remote adress for md5 lookup in rst packetshawnlu
[ Upstream commit 8a622e71f58ec9f092fc99eacae0e6cf14f6e742 ] md5 key is added in socket through remote address. remote address should be used in finding md5 key when sending out reset packet. Signed-off-by: shawnlu <shawn.lu@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03tcp: fix tcp_trim_head() to adjust segment count with skb MSSNeal Cardwell
[ Upstream commit 5b35e1e6e9ca651e6b291c96d1106043c9af314a ] This commit fixes tcp_trim_head() to recalculate the number of segments in the skb with the skb's existing MSS, so trimming the head causes the skb segment count to be monotonically non-increasing - it should stay the same or go down, but not increase. Previously tcp_trim_head() used the current MSS of the connection. But if there was a decrease in MSS between original transmission and ACK (e.g. due to PMTUD), this could cause tcp_trim_head() to counter-intuitively increase the segment count when trimming bytes off the head of an skb. This violated assumptions in tcp_tso_acked() that tcp_trim_head() only decreases the packet count, so that packets_acked in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to pass u32 pkts_acked values as large as 0xffffffff to ca_ops->pkts_acked(). As an aside, if tcp_trim_head() had really wanted the skb to reflect the current MSS, it should have called tcp_set_skb_tso_segs() unconditionally, since a decrease in MSS would mean that a single-packet skb should now be sliced into multiple segments. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03rds: Make rds_sock_lock BH rather than IRQ safe.David S. Miller
[ Upstream commit efc3dbc37412c027e363736b4f4c74ee5e8ecffc ] rds_sock_info() triggers locking warnings because we try to perform a local_bh_enable() (via sock_i_ino()) while hardware interrupts are disabled (via taking rds_sock_lock). There is no reason for rds_sock_lock to be a hardware IRQ disabling lock, none of these access paths run in hardware interrupt context. Therefore making it a BH disabling lock is safe and sufficient to fix this bug. Reported-by: Kumar Sanghvi <kumaras@chelsio.com> Reported-by: Josh Boyer <jwboyer@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03net: reintroduce missing rcu_assign_pointer() callsEric Dumazet
[ Upstream commit cf778b00e96df6d64f8e21b8395d1f8a859ecdc7 ] commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER) did a lot of incorrect changes, since it did a complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x, y). We miss needed barriers, even on x86, when y is not NULL. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03l2tp: l2tp_ip - fix possible oops on packet receiveJames Chapman
[ Upstream commit 68315801dbf3ab2001679fd2074c9dc5dcf87dfa ] When a packet is received on an L2TP IP socket (L2TPv3 IP link encapsulation), the l2tpip socket's backlog_rcv function calls xfrm4_policy_check(). This is not necessary, since it was called before the skb was added to the backlog. With CONFIG_NET_NS enabled, xfrm4_policy_check() will oops if skb->dev is null, so this trivial patch removes the call. This bug has always been present, but only when CONFIG_NET_NS is enabled does it cause problems. Most users are probably using UDP encapsulation for L2TP, hence the problem has only recently surfaced. EIP: 0060:[<c12bb62b>] EFLAGS: 00210246 CPU: 0 EIP is at l2tp_ip_recvmsg+0xd4/0x2a7 EAX: 00000001 EBX: d77b5180 ECX: 00000000 EDX: 00200246 ESI: 00000000 EDI: d63cbd30 EBP: d63cbd18 ESP: d63cbcf4 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Call Trace: [<c1218568>] sock_common_recvmsg+0x31/0x46 [<c1215c92>] __sock_recvmsg_nosec+0x45/0x4d [<c12163a1>] __sock_recvmsg+0x31/0x3b [<c1216828>] sock_recvmsg+0x96/0xab [<c10b2693>] ? might_fault+0x47/0x81 [<c10b2693>] ? might_fault+0x47/0x81 [<c1167fd0>] ? _copy_from_user+0x31/0x115 [<c121e8c8>] ? copy_from_user+0x8/0xa [<c121ebd6>] ? verify_iovec+0x3e/0x78 [<c1216604>] __sys_recvmsg+0x10a/0x1aa [<c1216792>] ? sock_recvmsg+0x0/0xab [<c105a99b>] ? __lock_acquire+0xbdf/0xbee [<c12d5a99>] ? do_page_fault+0x193/0x375 [<c10d1200>] ? fcheck_files+0x9b/0xca [<c10d1259>] ? fget_light+0x2a/0x9c [<c1216bbb>] sys_recvmsg+0x2b/0x43 [<c1218145>] sys_socketcall+0x16d/0x1a5 [<c11679f0>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c100305f>] sysenter_do_call+0x12/0x38 Code: c6 05 8c ea a8 c1 01 e8 0c d4 d9 ff 85 f6 74 07 3e ff 86 80 00 00 00 b9 17 b6 2b c1 ba 01 00 00 00 b8 78 ed 48 c1 e8 23 f6 d9 ff <ff> 76 0c 68 28 e3 30 c1 68 2d 44 41 c1 e8 89 57 01 00 83 c4 0c Signed-off-by: James Chapman <jchapman@katalix.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03af_unix: fix EPOLLET regression for stream socketsEric Dumazet
[ Upstream commit 6f01fd6e6f6809061b56e78f1e8d143099716d70 ] Commit 0884d7aa24 (AF_UNIX: Fix poll blocking problem when reading from a stream socket) added a regression for epoll() in Edge Triggered mode (EPOLLET) Appropriate fix is to use skb_peek()/skb_unlink() instead of skb_dequeue(), and only call skb_unlink() when skb is fully consumed. This remove the need to requeue a partial skb into sk_receive_queue head and the extra sk->sk_data_ready() calls that added the regression. This is safe because once skb is given to sk_receive_queue, it is not modified by a writer, and readers are serialized by u->readlock mutex. This also reduce number of spinlock acquisition for small reads or MSG_PEEK users so should improve overall performance. Reported-by: Nick Mathewson <nickm@freehaven.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Alexey Moiseytsev <himeraster@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03net caif: Register properly as a pernet subsystem.Eric W. Biederman
[ Upstream commit 8a8ee9aff6c3077dd9c2c7a77478e8ed362b96c6 ] caif is a subsystem and as such it needs to register with register_pernet_subsys instead of register_pernet_device. Among other problems using register_pernet_device was resulting in net_generic being called before the caif_net structure was allocated. Which has been causing net_generic to fail with either BUG_ON's or by return NULL pointers. A more ugly problem that could be caused is packets in flight why the subsystem is shutting down. To remove confusion also remove the cruft cause by inappropriately trying to fix this bug. With the aid of the previous patch I have tested this patch and confirmed that using register_pernet_subsys makes the failure go away as it should. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03netns: fix net_alloc_generic()Eric Dumazet
[ Upstream commit 073862ba5d249c20bd5c49fc6d904ff0e1f6a672 ] When a new net namespace is created, we should attach to it a "struct net_generic" with enough slots (even empty), or we can hit the following BUG_ON() : [ 200.752016] kernel BUG at include/net/netns/generic.h:40! ... [ 200.752016] [<ffffffff825c3cea>] ? get_cfcnfg+0x3a/0x180 [ 200.752016] [<ffffffff821cf0b0>] ? lockdep_rtnl_is_held+0x10/0x20 [ 200.752016] [<ffffffff825c41be>] caif_device_notify+0x2e/0x530 [ 200.752016] [<ffffffff810d61b7>] notifier_call_chain+0x67/0x110 [ 200.752016] [<ffffffff810d67c1>] raw_notifier_call_chain+0x11/0x20 [ 200.752016] [<ffffffff821bae82>] call_netdevice_notifiers+0x32/0x60 [ 200.752016] [<ffffffff821c2b26>] register_netdevice+0x196/0x300 [ 200.752016] [<ffffffff821c2ca9>] register_netdev+0x19/0x30 [ 200.752016] [<ffffffff81c1c67a>] loopback_net_init+0x4a/0xa0 [ 200.752016] [<ffffffff821b5e62>] ops_init+0x42/0x180 [ 200.752016] [<ffffffff821b600b>] setup_net+0x6b/0x100 [ 200.752016] [<ffffffff821b6466>] copy_net_ns+0x86/0x110 [ 200.752016] [<ffffffff810d5789>] create_new_namespaces+0xd9/0x190 net_alloc_generic() should take into account the maximum index into the ptr array, as a subsystem might use net_generic() anytime. This also reduces number of reallocations in net_assign_generic() Reported-by: Sasha Levin <levinsasha928@gmail.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sjur Brændeland <sjur.brandeland@stericsson.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03mac80211: fix work removal on deauth requestJohannes Berg
commit bc4934bc61d0a11fd62c5187ff83645628f8be8b upstream. When deauth is requested while an auth or assoc work item is in progress, we currently delete it without regard for any state it might need to clean up. Fix it by cleaning up for those items. In the case Pontus found, the problem manifested itself as such: authenticate with 00:23:69:aa:dd:7b (try 1) authenticated failed to insert Dummy STA entry for the AP (error -17) deauthenticating from 00:23:69:aa:dd:7b by local choice (reason=2) It could also happen differently if the driver uses the tx_sync callback. We can't just call the ->done() method of the work items because that will lock up due to the locking in cfg80211. This fix isn't very clean, but that seems acceptable since I have patches pending to remove this code completely. Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com> Tested-by: Pontus Fuchs <pontus.fuchs@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-01-25mac80211: revert on-channel work optimisationsJohannes Berg
commit e76aadc572288a158ae18ae1c10fe395c7bca066 upstream. Backport note: This patch it's a full revert of commit b23b025f "mac80211: Optimize scans on current operating channel.". On upstrem revert e76aadc5 we keep some bits from that commit, which are needed for upstream version of mac80211. The on-channel work optimisations have caused a number of issues, and the code is unfortunately very complex and almost impossible to follow. Instead of attempting to put in more workarounds let's just remove those optimisations, we can work on them again later, after we change the whole auth/assoc design. This should fix rate_control_send_low() warnings, see RH bug 731365. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25svcrpc: avoid memory-corruption on pool shutdownJ. Bruce Fields
commit b4f36f88b3ee7cf26bf0be84e6c7fc15f84dcb71 upstream. Socket callbacks use svc_xprt_enqueue() to add an xprt to a pool->sp_sockets list. In normal operation a server thread will later come along and take the xprt off that list. On shutdown, after all the threads have exited, we instead manually walk the sv_tempsocks and sv_permsocks lists to find all the xprt's and delete them. So the sp_sockets lists don't really matter any more. As a result, we've mostly just ignored them and hoped they would go away. Which has gotten us into trouble; witness for example ebc63e531cc6 "svcrpc: fix list-corrupting race on nfsd shutdown", the result of Ben Greear noticing that a still-running svc_xprt_enqueue() could re-add an xprt to an sp_sockets list just before it was deleted. The fix was to remove it from the list at the end of svc_delete_xprt(). But that only made corruption less likely--I can see nothing that prevents a svc_xprt_enqueue() from adding another xprt to the list at the same moment that we're removing this xprt from the list. In fact, despite the earlier xpo_detach(), I don't even see what guarantees that svc_xprt_enqueue() couldn't still be running on this xprt. So, instead, note that svc_xprt_enqueue() essentially does: lock sp_lock if XPT_BUSY unset add to sp_sockets unlock sp_lock So, if we do: set XPT_BUSY on every xprt. Empty every sp_sockets list, under the sp_socks locks. Then we're left knowing that the sp_sockets lists are all empty and will stay that way, since any svc_xprt_enqueue() will check XPT_BUSY under the sp_lock and see it set. And *then* we can continue deleting the xprt's. (Thanks to Jeff Layton for being correctly suspicious of this code....) Cc: Ben Greear <greearb@candelatech.com> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25svcrpc: destroy server sockets all at onceJ. Bruce Fields
commit 2fefb8a09e7ed251ae8996e0c69066e74c5aa560 upstream. There's no reason I can see that we need to call sv_shutdown between closing the two lists of sockets. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25svcrpc: fix double-free on shutdown of nfsd after changing pool modeJ. Bruce Fields
commit 61c8504c428edcebf23b97775a129c5b393a302b upstream. The pool_to and to_pool fields of the global svc_pool_map are freed on shutdown, but are initialized in nfsd startup only in the SVC_POOL_PERCPU and SVC_POOL_PERNODE cases. They *are* initialized to zero on kernel startup. So as long as you use only SVC_POOL_GLOBAL (the default), this will never be a problem. You're also OK if you only ever use SVC_POOL_PERCPU or SVC_POOL_PERNODE. However, the following sequence events leads to a double-free: 1. set SVC_POOL_PERCPU or SVC_POOL_PERNODE 2. start nfsd: both fields are initialized. 3. shutdown nfsd: both fields are freed. 4. set SVC_POOL_GLOBAL 5. start nfsd: the fields are left untouched. 6. shutdown nfsd: now we try to free them again. Step 4 is actually unnecessary, since (for some bizarre reason), nfsd automatically resets the pool mode to SVC_POOL_GLOBAL on shutdown. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25mac80211: fix rx->key NULL pointer dereference in promiscuous modeStanislaw Gruszka
commit 1140afa862842ac3e56678693050760edc4ecde9 upstream. Since: commit 816c04fe7ef01dd9649f5ccfe796474db8708be5 Author: Christian Lamparter <chunkeey@googlemail.com> Date: Sat Apr 30 15:24:30 2011 +0200 mac80211: consolidate MIC failure report handling is possible to that we dereference rx->key == NULL when driver set RX_FLAG_MMIC_STRIPPED and not RX_FLAG_IV_STRIPPED and we are in promiscuous mode. This happen with rt73usb and rt61pci at least. Before the commit we always check rx->key against NULL, so I assume fix should be done in mac80211 (also mic_fail path has similar check). References: https://bugzilla.redhat.com/show_bug.cgi?id=769766 http://rt2x00.serialmonkey.com/pipermail/users_rt2x00.serialmonkey.com/2012-January/004395.html Reported-by: Stuart D Gathman <stuart@gathman.org> Reported-by: Kai Wohlfahrt <kai.scorpio@gmail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25NFSv4: include bitmap in nfsv4 get acl dataAndy Adamson
commit bf118a342f10dafe44b14451a1392c3254629a1f upstream. The NFSv4 bitmap size is unbounded: a server can return an arbitrary sized bitmap in an FATTR4_WORD0_ACL request. Replace using the nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data xdr length to the (cached) acl page data. This is a general solution to commit e5012d1f "NFSv4.1: update nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-12igmp: Avoid zero delay when receiving odd mixture of IGMP queriesBen Hutchings
commit a8c1f65c79cbbb2f7da782d4c9d15639a9b94b27 upstream. Commit 5b7c84066733c5dfb0e4016d939757b38de189e4 ('ipv4: correct IGMP behavior on v3 query during v2-compatibility mode') added yet another case for query parsing, which can result in max_delay = 0. Substitute a value of 1, as in the usual v3 case. Reported-by: Simon McVittie <smcv@debian.org> References: http://bugs.debian.org/654876 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-04Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-01-03Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth
2012-01-03sch_qfq: fix overflow in qfq_update_start()Eric Dumazet
grp->slot_shift is between 22 and 41, so using 32bit wide variables is probably a typo. This could explain QFQ hangs Dave reported to me, after 2^23 packets ? (23 = 64 - 41) Reported-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-31netfilter: ctnetlink: fix timeout calculationXi Wang
The sanity check (timeout < 0) never works; the dividend is unsigned and so is the division, which should have been a signed division. long timeout = (ct->timeout.expires - jiffies) / HZ; if (timeout < 0) timeout = 0; This patch converts the time values to signed for the division. Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-31ipvs: try also real server with port 0 in backup serverJulian Anastasov
We should not forget to try for real server with port 0 in the backup server when processing the sync message. We should do it in all cases because the backup server can use different forwarding method. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-27packet: fix possible dev refcnt leak when bind failWei Yongjun
If bind is fail when bind is called after set PACKET_FANOUT sock option, the dev refcnt will leak. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-24Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller
2011-12-24netem: dont call vfree() under spinlock and BH disabledEric Dumazet
commit 6373a9a286 (netem: use vmalloc for distribution table) added a regression, since vfree() is called while holding a spinlock and BH being disabled. Fix this by doing the pointers swap in critical section, and freeing after spinlock release. Also add __GFP_NOWARN to the kmalloc() try, since we fallback to vmalloc(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-24netfilter: ctnetlink: fix scheduling while atomic if helper is autoloadedPablo Neira Ayuso
This patch fixes one scheduling while atomic error: [ 385.565186] ctnetlink v0.93: registering with nfnetlink. [ 385.565349] BUG: scheduling while atomic: lt-expect_creat/16163/0x00000200 It can be triggered with utils/expect_create included in libnetfilter_conntrack if the FTP helper is not loaded. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-24netfilter: ctnetlink: fix return value of ctnetlink_get_expect()Pablo Neira Ayuso
This fixes one bogus error that is returned to user-space: libnetfilter_conntrack/utils# ./expect_get TEST: get expectation (-1)(Unknown error 18446744073709551504) This patch includes the correct handling for EAGAIN (nfnetlink uses this error value to restart the operation after module auto-loading). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23Revert "Bluetooth: Increase HCI reset timeout in hci_dev_do_close"Gustavo F. Padovan
This reverts commit e1b6eb3ccb0c2a34302a9fd87dd15d7b86337f23. This was causing a delay of 10 seconds in the resume process of a Thinkpad laptop. I'm afraid this could affect more devices once 3.2 is released. Reported-by: Tomáš Janoušek <tomi@nomi.cz> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-12-23Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller
2011-12-23netfilter: xt_connbytes: handle negation correctlyFlorian Westphal
"! --connbytes 23:42" should match if the packet/byte count is not in range. As there is no explict "invert match" toggle in the match structure, userspace swaps the from and to arguments (i.e., as if "--connbytes 42:23" were given). However, "what <= 23 && what >= 42" will always be false. Change things so we use "||" in case "from" is larger than "to". This change may look like it breaks backwards compatibility when "to" is 0. However, older iptables binaries will refuse "connbytes 42:0", and current releases treat it to mean "! --connbytes 0:42", so we should be fine. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>