summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2007-01-26[Bluetooth] Fix deadlock in the L2CAP layerMarcel Holtmann
The Bluetooth L2CAP layer has 2 locks that are used in softirq context, (one spinlock and one rwlock, where the softirq usage is readlock) but where not all usages of the lock were _bh safe. The patch below corrects this. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-26[Bluetooth] Add locking for bt_proto array manipulationMarcel Holtmann
The bt_proto array needs to be protected by some kind of locking to prevent a race condition between bt_sock_create and bt_sock_register. And in addition all calls to sk_alloc need to be made GFP_ATOMIC now. Signed-off-by: Masatake YAMATO <jet@gyve.org> Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-26[Bluetooth] Fix compat ioctl for BNEP, CMTP and HIDPMarcel Holtmann
There exists no attempt do deal with the fact that a structure with a uint32_t followed by a pointer is going to be different for 32-bit and 64-bit userspace. Any 32-bit process trying to use it will be failing with -EFAULT if it's lucky; suffering from having data dumped at a random address if it's not. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-26[Bluetooth] Handle command complete event for exit periodic inquiryMarcel Holtmann
The command complete event of the exit periodic inquiry command must clear the HCI_INQUIRY flag and finish the HCI request. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-26[Bluetooth] Return EINPROGRESS for non-blocking socket callsMarcel Holtmann
In case of non-blocking socket calls we should return EINPROGRESS and not EAGAIN. Signed-off-by: Ulisses Furquim <ulissesf@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-25[Bluetooth] Fix uninitialized return value for RFCOMM sendmsg()Marcel Holtmann
When calling send() with a zero length parameter on a RFCOMM socket it returns a positive value. In this rare case the variable err is used uninitialized and unfortunately its value is returned. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-25[Bluetooth] More checks if DLC is still attached to the TTYMarcel Holtmann
If the DLC device is no longer attached to the TTY device, then return errors or default values for various callbacks of the TTY layer. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-25BLUETOOTH: Fix unaligned access in hci_send_to_sock.David S. Miller
The "u16 *" derefs of skb->data need to be wrapped inside of a get_unaligned(). Thanks to Gustavo Zacarias for the bug report. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-25[Bluetooth] Check if DLC is still attached to the TTYMarcel Holtmann
If the DLC device is no longer attached to the TTY device, then it makes no sense to go through with changing the termios settings. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-24[Bluetooth] Let BT_HIDP depend on INPUTAdrian Bunk
This patch lets BT_HIDP depend on instead of select INPUT. This fixes the following warning during an s390 build: net/bluetooth/hidp/Kconfig:4:warning: 'select' used by config symbol 'BT_HIDP' refer to undefined symbol 'INPUT' A dependency on INPUT also implies !S390 (and therefore makes the explicit dependency obsolete) since INPUT is not available on s390. The practical difference should be nearly zero, since INPUT is always set to y unless EMBEDDED=y (or S390=y). Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-22NETFILTER: arp_tables: missing unregistration on module unloadPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-20NETFILTER: NAT: fix NOTRACK checksum handlingPatrick McHardy
The whole idea with the NOTRACK netfilter target is that you can force the netfilter code to avoid connection tracking, and all costs assosciated with it, by making traffic match a NOTRACK rule. But this is totally broken by the fact that we do a checksum calculation over the packet before we do the NOTRACK bypass check, which is very expensive. People setup NOTRACK rules explicitly to avoid all of these kinds of costs. This patch from Patrick, already in Linus's tree, fixes the bug. Move the check for ip_conntrack_untracked before the call to skb_checksum_help to fix NOTRACK excemptions from NAT. Pre-2.6.19 NAT code breaks TSO by invalidating hardware checksums for every packet, even if explicitly excluded from NAT through NOTRACK. 2.6.19 includes a fix that makes NAT and TSO live in harmony, but the performance degradation caused by this deserves making at least the workaround work properly in -stable. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-18[IPV6] Fix joining all-node multicast group.YOSHIFUJI Hideaki
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09ebtables: check struct type before computing gapChuck Ebbert
Check struct type before dereferencing fields in ebt_entry. Failure to check can cause oops. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09TCP: Fix and simplify microsecond rtt samplingJohn Heffner
This changes the microsecond RTT sampling so that samples are taken in the same way that RTT samples are taken for the RTO calculator: on the last segment acknowledged, and only when the segment hasn't been retransmitted. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09[IPV4/IPV6]: Fix inet{,6} device initialization order.David L Stevens
It is important that we only assign dev->ip{,6}_ptr only after all portions of the inet{,6} are setup. Otherwise we can receive packets before the multicast spinlocks et al. are initialized. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09Bluetooth: Add packet size checks for CAPI messages (CVE-2006-6106)Marcel Holtmann
With malformed packets it might be possible to overwrite internal CMTP and CAPI data structures. This patch adds additional length checks to prevent these kinds of remote attacks. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-04[PKTGEN]: Fix module load/unload races.Robert Olsson
Adrian Bunk: Backported to 2.6.16. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-04NET_SCHED: Fix fallout from dev->qdisc RCU changePatrick McHardy
The move of qdisc destruction to a rcu callback broke locking in the entire qdisc layer by invalidating previously valid assumptions about the context in which changes to the qdisc tree occur. The two assumptions were: - since changes only happen in process context, read_lock doesn't need bottem half protection. Now invalid since destruction of inner qdiscs, classifiers, actions and estimators happens in the RCU callback unless they're manually deleted, resulting in dead-locks when read_lock in process context is interrupted by write_lock_bh in bottem half context. - since changes only happen under the RTNL, no additional locking is necessary for data not used during packet processing (f.e. u32_list). Again, since destruction now happens in the RCU callback, this assumption is not valid anymore, causing races while using this data, which can result in corruption or use-after-free. Instead of "fixing" this by disabling bottem halfs everywhere and adding new locks/refcounting, this patch makes these assumptions valid again by moving destruction back to process context. Since only the dev->qdisc pointer is protected by RCU, but ->enqueue and the qdisc tree are still protected by dev->qdisc_lock, destruction of the tree can be performed immediately and only the final free needs to happen in the rcu callback to make sure dev_queue_xmit doesn't access already freed memory. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-18bridge-netfilter: don't overwrite memory outside of skbStephen Hemminger
The bridge netfilter code needs to check for space at the front of the skb before overwriting; otherwise if skb from device doesn't have headroom, then it will cause random memory corruption. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-17[IPV4] ip_fragment: Always compute hash with ipfrag_lock held.David S. Miller
Otherwise we could compute an inaccurate hash due to the random seed changing. Noticed by Zach Brown and patch is based upon some feedback from Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-15Fix SUNRPC wakeup/execute race conditionChristophe Saout
The sunrpc scheduler contains a race condition that can let an RPC task end up being neither running nor on any wait queue. The race takes place between rpc_make_runnable (called from rpc_wake_up_task) and __rpc_execute under the following condition: First __rpc_execute calls tk_action which puts the task on some wait queue. The task is dequeued by another process before __rpc_execute continues its execution. While executing rpc_make_runnable exactly after setting the task `running' bit and before clearing the `queued' bit __rpc_execute picks up execution, clears `running' and subsequently both functions fall through, both under the false assumption somebody else took the job. Swapping rpc_test_and_set_running with rpc_clear_queued in rpc_make_runnable fixes that hole. This introduces another possible race condition that can be handled by checking for `queued' after setting the `running' bit. Bug noticed on a 4-way x86_64 system under XEN with an NFSv4 server on the same physical machine, apparently one of the few ways to hit this race condition at all. Signed-off-by: Christophe Saout <christophe@saout.de> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-14[IPV4]: severe locking bug in fib_semantics.cAlexey Kuznetsov
Found in 2.4 by Yixin Pan <yxpan@hotmail.com>. > When I read fib_semantics.c of Linux-2.4.32, write_lock(&fib_info_lock) = > is used in fib_release_info() instead of write_lock_bh(&fib_info_lock). = > Is the following case possible: a BH interrupts fib_release_info() while = > holding the write lock, and calls ip_check_fib_default() which calls = > read_lock(&fib_info_lock), and spin forever. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-09[IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries.David S. Miller
We grab a reference to the route's inetpeer entry but forget to release it in xfrm4_dst_destroy(). Bug discovered by Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-09[XFRM]: Use output device disable_xfrm for forwarded packetsPatrick McHardy
Currently the behaviour of disable_xfrm is inconsistent between locally generated and forwarded packets. For locally generated packets disable_xfrm disables the policy lookup if it is set on the output device, for forwarded traffic however it looks at the input device. This makes it impossible to disable xfrm on all devices but a dummy device and use normal routing to direct traffic to that device. Always use the output device when checking disable_xfrm. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04bridge: fix possible overflow in get_fdb_entries (CVE-2006-5751)Chris Wright
Make sure to properly clamp maxnum to avoid overflow (CVE-2006-5751). Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04[EBTABLES]: Prevent wraparounds in checks for entry components' sizes.Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04[EBTABLES]: Deal with the worst-case behaviour in loop checks.Al Viro
No need to revisit a chain we'd already finished with during the check for current hook. It's either instant loop (which we'd just detected) or a duplicate work. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04[EBTABLES]: Verify that ebt_entries have zero ->distinguisher.Al Viro
We need that for iterator to work; existing check had been too weak. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04[EBTABLES]: Fix wraparounds in ebt_entries verification.Al Viro
We need to verify that a) we are not too close to the end of buffer to dereference b) next entry we'll be checking won't be _before_ our While we are at it, don't subtract unrelated pointers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04[NET_SCHED]: policer: restore compatibility with old iproute binariesPatrick McHardy
The tc actions increased the size of struct tc_police, which broke compatibility with old iproute binaries since both the act_police and the old NET_CLS_POLICE code check for an exact size match. Since the new members are not even used, the simple fix is to also accept the size of the old structure. Dumping is not affected since old userspace will receive a bigger structure, which is handled fine. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04[PKT_SCHED] act_gact: division by zeroKim Nordlund
Not returning -EINVAL, because someone might want to use the value zero in some future gact_prob algorithm? Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04[IPV6]: Fix address/interface handling in UDP and DCCP, according to the ↵YOSHIFUJI Hideaki
scoping architecture. TCP and RAW do not have this issue. Closes Bug #7432. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04remove garbage the sneaked into the ext3 fixAdrian Bunk
Spotted by Thomas Voegtle. Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-29SCTP: Always linearise packet on inputHerbert Xu
I was looking at a RHEL5 bug report involving Xen and SCTP (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212550). It turns out that SCTP wasn't written to handle skb fragments at all. The absence of any calls to skb_may_pull is testament to that. It just so happens that Xen creates fragmented packets more often than other scenarios (header & data split when going from domU to dom0). That's what caused this bug to show up. Until someone has the time sits down and audits the entire net/sctp directory, here is a conservative and safe solution that simply linearises all packets on input. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-29add forgotten ->b_data in memcpy() call in ext3/resize.c (oopsable)Al Viro
sbi->s_group_desc is an array of pointers to buffer_head. memcpy() of buffer size from address of buffer_head is a bad idea - it will generate junk in any case, may oops if buffer_head is close to the end of slab page and next page is not mapped and isn't what was intended there. IOW, ->b_data is missing in that call. Fortunately, result doesn't go into the primary on-disk data structures, so only backup ones get crap written to them; that had allowed this bug to remain unnoticed until now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-29[UDP]: Make udp_encap_rcv use pskb_may_pullOlaf Kirch
Make udp_encap_rcv use pskb_may_pull IPsec with NAT-T breaks on some notebooks using the latest e1000 chipset, when header split is enabled. When receiving sufficiently large packets, the driver puts everything up to and including the UDP header into the header portion of the skb, and the rest goes into the paged part. udp_encap_rcv forgets to use pskb_may_pull, and fails to decapsulate it. Instead, it passes it up it to the IKE daemon. Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-24[IPX]: Annotate and fix IPX checksumAl Viro
Calculation of IPX checksum got buggered about 2.4.0. The old variant mangled the packet; that got fixed, but calculation itself got buggered. Restored the correct logics, fixed a subtle breakage we used to have even back then: if the sum is 0 mod 0xffff, we want to return 0, not 0xffff. The latter has special meaning for IPX (cheksum disabled). Observation (and obvious fix) nicked from history of FreeBSD ipx_cksum.c... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-24[IPX]: Fix typo, ipxhdr() --> ipx_hdr()David S. Miller
Noticed by Dave Jones. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-24[IPX]: Another nonlinear receive fixStephen Hemminger
Need to check some more cases in IPX receive. If the skb is purely fragments, the IPX header needs to be extracted. The function pskb_may_pull() may in theory invalidate all the pointers in the skb, so references to ipx header must be refreshed. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-24[IPX]: Header length validation neededStephen Hemminger
This patch will linearize and check there is enough data. It handles the pprop case as well as avoiding a whole audit of the routing code. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-24[IPX]: Correct return type of ipx_map_frame_type().Alexey Dobriyan
Casting BE16 to int and back may or may not work. Correct, to be sure. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-20[RTNETLINK]: Fix IFLA_ADDRESS handling.David Miller
The ->set_mac_address handlers expect a pointer to a sockaddr which contains the MAC address, whereas IFLA_ADDRESS provides just the MAC address itself. So whip up a sockaddr to wrap around the netlink attribute for the ->set_mac_address call. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-17Fix timer race in dst GC codeDmitry Mishin
Replace add_timer() by mod_timer() in dst_run_gc in order to avoid BUG message. CPU1 CPU2 dst_run_gc() entered dst_run_gc() entered spin_lock(&dst_lock) ..... del_timer(&dst_gc_timer) fail to get lock .... mod_timer() <--- puts timer back to the list add_timer(&dst_gc_timer) <--- BUG because timer is in list already. Found during OpenVZ internal testing. At first we thought that it is OpenVZ specific as we added dst_run_gc(0) call in dst_dev_event(), but as Alexey pointed to me it is possible to trigger this condition in mainstream kernel. F.e. timer has fired on CPU2, but the handler was preeempted by an irq before dst_lock is tried. Meanwhile, someone on CPU1 adds an entry to gc list and starts the timer. If CPU2 was preempted long enough, this timer can expire simultaneously with resuming timer handler on CPU1, arriving exactly to the situation described. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-11[NET]: Update frag_list in pskb_trimHerbert Xu
When pskb_trim has to defer to ___pksb_trim to trim the frag_list part of the packet, the frag_list is not updated to reflect the trimming. This will usually work fine until you hit something that uses the packet length or tail from the frag_list. Examples include esp_output and ip_fragment. Another problem caused by this is that you can end up with a linear packet with a frag_list attached. It is possible to get away with this if we audit everything to make sure that they always consult skb->len before going down onto frag_list. In fact we can do the samething for the paged part as well to avoid copying the data area of the skb. For now though, let's do the conservative fix and update frag_list. Many thanks to Marco Berizzi for helping me to track down this bug. This 4-year old bug took 3 months to track down. Marco was very patient indeed :) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-09[NET]: __alloc_pages() failures reported due to fragmentationLarry Woodman
We have seen a couple of __alloc_pages() failures due to fragmentation, there is plenty of free memory but no large order pages available. I think the problem is in sock_alloc_send_pskb(), the gfp_mask includes __GFP_REPEAT but its never used/passed to the page allocator. Shouldnt the gfp_mask be passed to alloc_skb() ? Signed-off-by: Larry Woodman <lwoodman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-09[NET]: Set truesize in pskb_copyHerbert Xu
Since pskb_copy tacks on the non-linear bits from the original skb, it needs to count them in the truesize field of the new skb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-09[TCP]: Don't use highmem in tcp hash size calculation.John Heffner
This patch removes consideration of high memory when determining TCP hash table sizes. Taking into account high memory results in tcp_mem values that are too large. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-08[IPV4]: Limit rt cache size properly.Kirill Korotaev
During OpenVZ stress testing we found that UDP traffic with random src can generate too much excessive rt hash growing leading finally to OOM and kernel panics. It was found that for 4GB i686 system (having 1048576 total pages and 225280 normal zone pages) kernel allocates the following route hash: syslog: IP route cache hash table entries: 262144 (order: 8, 1048576 bytes) => ip_rt_max_size = 4194304 entries, i.e. max rt size is 4194304 * 256b = 1Gb of RAM > normal_zone Attached the patch which removes HASH_HIGHMEM flag from alloc_large_system_hash() call. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-08[NET]: Add missing UFO initialisationsHerbert Xu
This bug was unknowingly fixed the GSO patches (or rather, its effect was unknown at the time). Thanks to Marco Berizzi's persistence which is documented in the thread "ipsec tunnel asymmetrical mtu", we now know that it can have highly non-obvious symptoms. What happens is that uninitialised uso_size fields can cause packets to be incorrectly identified as UFO, which means that it does not get fragmented even if it's over the MTU. The fix is simple enough. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>