summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2007-02-26[TCP]: Prevent pseudo garbage in SYN's advertized windowIlpo Järvinen
TCP may advertize up to 16-bits window in SYN packets (no window scaling allowed). At the same time, TCP may have rcv_wnd (32-bits) that does not fit to 16-bits without window scaling resulting in pseudo garbage into advertized window from the low-order bits of rcv_wnd. This can happen at least when mss <= (1<<wscale) (see tcp_select_initial_window). This patch fixes the handling of SYN advertized windows (compile tested only). In worst case (which is unlikely to occur though), the receiver advertized window could be just couple of bytes. I'm not sure that such situation would be handled very well at all by the receiver!? Fortunately, the situation normalizes after the first non-SYN ACK is received because it has the correct, scaled window. Alternatively, tcp_select_initial_window could be changed to prevent too large rcv_wnd in the first place. [ tcp_make_synack() has the same bug, and I've added a fix for that to this patch -DaveM ] Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-26[IPV4/IPV6] multicast: Check add_grhead() return valueAlexey Dobriyan
add_grhead() allocates memory with GFP_ATOMIC and in at least two places skb from it passed to skb_put() without checking. Adrian Bunk: backported to 2.6.16 Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-14[TCP]: struct tcp_sack_block annotationsAl Viro
Some of the instances of tcp_sack_block are host-endian, some - net-endian. Define struct tcp_sack_block_wire identical to struct tcp_sack_block with u32 replaced with __be32; annotate uses of tcp_sack_block replacing net-endian ones with tcp_sack_block_wire. Change is obviously safe since for cc(1) __be32 is typedefed to u32. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-14[NETFILTER]: Clear GSO bits for TCP reset packetHerbert Xu
The TCP reset packet is copied from the original. This includes all the GSO bits which do not apply to the new packet. So we should clear those bits. Spotted by Patrick McHardy. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-14[TCP]: Don't apply FIN exception to full TSO segments.John Heffner
Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-14TCP: skb is unexpectedly freed.Masayuki Nakagawa
I encountered a kernel panic with my test program, which is a very simple IPv6 client-server program. The server side sets IPV6_RECVPKTINFO on a listening socket, and the client side just sends a message to the server. Then the kernel panic occurs on the server. (If you need the test program, please let me know. I can provide it.) This problem happens because a skb is forcibly freed in tcp_rcv_state_process(). When a socket in listening state(TCP_LISTEN) receives a syn packet, then tcp_v6_conn_request() will be called from tcp_rcv_state_process(). If the tcp_v6_conn_request() successfully returns, the skb would be discarded by __kfree_skb(). However, in case of a listening socket which was already set IPV6_RECVPKTINFO, an address of the skb will be stored in treq->pktopts and a ref count of the skb will be incremented in tcp_v6_conn_request(). But, even if the skb is still in use, the skb will be freed. Then someone still using the freed skb will cause the kernel panic. I suggest to use kfree_skb() instead of __kfree_skb(). Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-14TCP: Fix sorting of SACK blocks.Baruch Even
The sorting of SACK blocks actually munges them rather than sort, causing the TCP stack to ignore some SACK information and breaking the assumption of ordered SACK blocks after sorting. The sort takes the data from a second buffer which isn't moved causing subsequent data moves to occur from the wrong location. The fix is to use a temporary buffer as a normal sort does. Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-22NETFILTER: arp_tables: missing unregistration on module unloadPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-20NETFILTER: NAT: fix NOTRACK checksum handlingPatrick McHardy
The whole idea with the NOTRACK netfilter target is that you can force the netfilter code to avoid connection tracking, and all costs assosciated with it, by making traffic match a NOTRACK rule. But this is totally broken by the fact that we do a checksum calculation over the packet before we do the NOTRACK bypass check, which is very expensive. People setup NOTRACK rules explicitly to avoid all of these kinds of costs. This patch from Patrick, already in Linus's tree, fixes the bug. Move the check for ip_conntrack_untracked before the call to skb_checksum_help to fix NOTRACK excemptions from NAT. Pre-2.6.19 NAT code breaks TSO by invalidating hardware checksums for every packet, even if explicitly excluded from NAT through NOTRACK. 2.6.19 includes a fix that makes NAT and TSO live in harmony, but the performance degradation caused by this deserves making at least the workaround work properly in -stable. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09TCP: Fix and simplify microsecond rtt samplingJohn Heffner
This changes the microsecond RTT sampling so that samples are taken in the same way that RTT samples are taken for the RTO calculator: on the last segment acknowledged, and only when the segment hasn't been retransmitted. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-01-09[IPV4/IPV6]: Fix inet{,6} device initialization order.David L Stevens
It is important that we only assign dev->ip{,6}_ptr only after all portions of the inet{,6} are setup. Otherwise we can receive packets before the multicast spinlocks et al. are initialized. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-17[IPV4] ip_fragment: Always compute hash with ipfrag_lock held.David S. Miller
Otherwise we could compute an inaccurate hash due to the random seed changing. Noticed by Zach Brown and patch is based upon some feedback from Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-14[IPV4]: severe locking bug in fib_semantics.cAlexey Kuznetsov
Found in 2.4 by Yixin Pan <yxpan@hotmail.com>. > When I read fib_semantics.c of Linux-2.4.32, write_lock(&fib_info_lock) = > is used in fib_release_info() instead of write_lock_bh(&fib_info_lock). = > Is the following case possible: a BH interrupts fib_release_info() while = > holding the write lock, and calls ip_check_fib_default() which calls = > read_lock(&fib_info_lock), and spin forever. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-09[IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries.David S. Miller
We grab a reference to the route's inetpeer entry but forget to release it in xfrm4_dst_destroy(). Bug discovered by Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-09[XFRM]: Use output device disable_xfrm for forwarded packetsPatrick McHardy
Currently the behaviour of disable_xfrm is inconsistent between locally generated and forwarded packets. For locally generated packets disable_xfrm disables the policy lookup if it is set on the output device, for forwarded traffic however it looks at the input device. This makes it impossible to disable xfrm on all devices but a dummy device and use normal routing to direct traffic to that device. Always use the output device when checking disable_xfrm. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-04remove garbage the sneaked into the ext3 fixAdrian Bunk
Spotted by Thomas Voegtle. Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-29add forgotten ->b_data in memcpy() call in ext3/resize.c (oopsable)Al Viro
sbi->s_group_desc is an array of pointers to buffer_head. memcpy() of buffer size from address of buffer_head is a bad idea - it will generate junk in any case, may oops if buffer_head is close to the end of slab page and next page is not mapped and isn't what was intended there. IOW, ->b_data is missing in that call. Fortunately, result doesn't go into the primary on-disk data structures, so only backup ones get crap written to them; that had allowed this bug to remain unnoticed until now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-29[UDP]: Make udp_encap_rcv use pskb_may_pullOlaf Kirch
Make udp_encap_rcv use pskb_may_pull IPsec with NAT-T breaks on some notebooks using the latest e1000 chipset, when header split is enabled. When receiving sufficiently large packets, the driver puts everything up to and including the UDP header into the header portion of the skb, and the rest goes into the paged part. udp_encap_rcv forgets to use pskb_may_pull, and fails to decapsulate it. Instead, it passes it up it to the IKE daemon. Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-09[TCP]: Don't use highmem in tcp hash size calculation.John Heffner
This patch removes consideration of high memory when determining TCP hash table sizes. Taking into account high memory results in tcp_mem values that are too large. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-08[IPV4]: Limit rt cache size properly.Kirill Korotaev
During OpenVZ stress testing we found that UDP traffic with random src can generate too much excessive rt hash growing leading finally to OOM and kernel panics. It was found that for 4GB i686 system (having 1048576 total pages and 225280 normal zone pages) kernel allocates the following route hash: syslog: IP route cache hash table entries: 262144 (order: 8, 1048576 bytes) => ip_rt_max_size = 4194304 entries, i.e. max rt size is 4194304 * 256b = 1Gb of RAM > normal_zone Attached the patch which removes HASH_HIGHMEM flag from alloc_large_system_hash() call. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-05fix RARP ic_servaddr breakageAl Viro
memcpy 4 bytes to address of auto unsigned long variable followed by comparison with u32 is a bloody bad idea. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-08-30ip_tables: fix table locking in ipt_do_tablePatrick McHardy
table->private might change because of ruleset changes, don't use it without holding the lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-08-26ulog: fix panic on SMP kernelsMark Huang
Fix kernel panic on various SMP machines. The culprit is a null ub->skb in ulog_send(). If ulog_timer() has already been scheduled on one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the queue on another CPU by calling ulog_send() right before it exits, there will be no skbuff when ulog_timer() acquires the lock and calls ulog_send(). Cancelling the timer in ulog_send() doesn't help because it has already been scheduled and is running on the first CPU. Similar problem exists in ebt_ulog.c and nfnetlink_log.c. Signed-off-by: Mark Huang <mlhuang@cs.princeton.edu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30[PATCH] NETFILTER: SCTP conntrack: fix crash triggered by packet without ↵Patrick McHardy
chunks [CVE-2006-2934] When a packet without any chunks is received, the newconntrack variable in sctp_packet contains an out of bounds value that is used to look up an pointer from the array of timeouts, which is then dereferenced, resulting in a crash. Make sure at least a single chunk is present. Problem noticed by George A. Theall <theall@tenablesecurity.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-30[PATCH] NETFILTER: Fix small information leak in SO_ORIGINAL_DST (CVE-2006-1343)Marcel Holtmann
It appears that sockaddr_in.sin_zero is not zeroed during getsockopt(...SO_ORIGINAL_DST...) operation. This can lead to an information leak (CVE-2006-1343). Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2006-05-22[PATCH] NETFILTER: SNMP NAT: fix memory corruption (CVE-2006-2444)Patrick McHardy
CVE-2006-2444 - Potential remote DoS in SNMP NAT helper. Fix memory corruption caused by snmp_trap_decode: - When snmp_trap_decode fails before the id and address are allocated, the pointers contain random memory, but are freed by the caller (snmp_parse_mangle). - When snmp_trap_decode fails after allocating just the ID, it tries to free both address and ID, but the address pointer still contains random memory. The caller frees both ID and random memory again. - When snmp_trap_decode fails after allocating both, it frees both, and the callers frees both again. The corruption can be triggered remotely when the ip_nat_snmp_basic module is loaded and traffic on port 161 or 162 is NATed. Found by multiple testcases of the trap-app and trap-enc groups of the PROTOS c06-snmpv1 testsuite. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2006-05-20[PATCH] Netfilter: do_add_counters race, possible oops or info leak ↵Chris Wright
(CVE-2006-0039) Solar Designer found a race condition in do_add_counters(). The beginning of paddc is supposed to be the same as tmp which was sanity-checked above, but it might not be the same in reality. In case the integer overflow and/or the race condition are triggered, paddc->num_counters might not match the allocation size for paddc. If the check below (t->private->number != paddc->num_counters) nevertheless passes (perhaps this requires the race condition to be triggered), IPT_ENTRY_ITERATE() would read kernel memory beyond the allocation size, potentially causing an oops or leaking sensitive data (e.g., passwords from host system or from another VPS) via counter increments. This requires CAP_NET_ADMIN. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=191698 Cc: Solar Designer <solar@openwall.com> Cc: Kirill Korotaev <dev@sw.ru> Cc: Patrick McHardy <kaber@trash.net> (chrisw: rebase of Kirill's patch to 2.6.16.16) Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-05-02[PATCH] NETFILTER: SCTP conntrack: fix infinite loop (CVE-2006-1527)Patrick McHardy
[NETFILTER]: SCTP conntrack: fix infinite loop fix infinite loop in the SCTP-netfilter code: check SCTP chunk size to guarantee progress of for_each_sctp_chunk(). (all other uses of for_each_sctp_chunk() are preceded by do_basic_checks(), so this fix should be complete.) Based on patch from Ingo Molnar <mingo@elte.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-24[PATCH] Fix truesize underflowHerbert Xu
[TCP]: Fix truesize underflow There is a problem with the TSO packet trimming code. The cause of this lies in the tcp_fragment() function. When we allocate a fragment for a completely non-linear packet the truesize is calculated for a payload length of zero. This means that truesize could in fact be less than the real payload length. When that happens the TSO packet trimming can cause truesize to become negative. This in turn can cause sk_forward_alloc to be -n * PAGE_SIZE which would trigger the warning. I've copied the code DaveM used in tso_fragment which should work here. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-18[PATCH] ip_route_input panic fix (CVE-2006-1525)Stephen Hemminger
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=6388 The bug is caused by ip_route_input dereferencing skb->nh.protocol of the dummy skb passed dow from inet_rtm_getroute (Thanks Thomas for seeing it). It only happens if the route requested is for a multicast IP address. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-17[PATCH] NETFILTER: Fix fragmentation issues with bridge netfilterPatrick McHardy
[NETFILTER]: Fix fragmentation issues with bridge netfilter The conntrack code doesn't do re-fragmentation of defragmented packets anymore but relies on fragmentation in the IP layer. Purely bridged packets don't pass through the IP layer, so the bridge netfilter code needs to take care of fragmentation itself. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-07[PATCH] fib_trie.c node freeing fixDavid S. Miller
Please apply to 2.6.{14,15,16} -stable, thanks a lot. From: Robert Olsson <robert.olsson@its.uu.se> [FIB_TRIE]: Fix leaf freeing. Seems like leaf (end-nodes) has been freed by __tnode_free_rcu and not by __leaf_free_rcu. This fixes the problem. Only tnode_free is now used which checks for appropriate node type. free_leaf can be removed. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-07[PATCH] {ip, nf}_conntrack_netlink: fix expectation notifier unregistrationMartin Josefsson
[NETFILTER]: {ip,nf}_conntrack_netlink: fix expectation notifier unregistration This patch fixes expectation notifier unregistration on module unload to use ip_conntrack_expect_unregister_notifier(). This bug causes a soft lockup at the first expectation created after a rmmod ; insmod of this module. Should go into -stable as well. Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-27[PATCH] TCP: Do not use inet->id of global tcp_socket when sending RST ↵Alexey Kuznetsov
(CVE-2006-1242) The problem is in ip_push_pending_frames(), which uses: if (!df) { __ip_select_ident(iph, &rt->u.dst, 0); } else { iph->id = htons(inet->id++); } instead of ip_select_ident(). Right now I think the code is a nonsense. Most likely, I copied it from old ip_build_xmit(), where it was really special, we had to decide whether to generate unique ID when generating the first (well, the last) fragment. In ip_push_pending_frames() it does not make sense, it should use plain ip_select_ident() instead. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-12[NETFILTER]: arp_tables: fix NULL pointer dereferencePatrick McHardy
The check is wrong and lets NULL-ptrs slip through since !IS_ERR(NULL) is true. Coverity #190 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12[IPV4/6]: Fix UFO error propagationPatrick McHardy
When ufo_append_data fails err is uninitialized, but returned back. Strangely gcc doesn't notice it. Coverity #901 and #902 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12[TCP]: tcp_highspeed: fix AIMD table out-of-bounds accessPatrick McHardy
Covertiy #547 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-11[TCP]: Fix tcp_tso_should_defer() when limit>=65536David S. Miller
That's >= a full sized TSO frame, so we should always return 0 in that case. Based upon a report and initial patch from Lachlan Andrew, final patch suggested by Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-07[NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumptionThomas Graf
The size of the skb carrying the netlink message is not equivalent to the length of the actual netlink message due to padding. ip_queue matches the length of the payload against the original packet size to determine if packet mangling is desired, due to the above wrong assumption arbitary packets may not be mangled depening on their original size. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27[NETFILTER]: Restore {ipt,ip6t,ebt}_LOG compatibilityPatrick McHardy
The nfnetlink_log infrastructure changes broke compatiblity of the LOG targets. They currently use whatever log backend was registered first, which means that if ipt_ULOG was loaded first, no messages will be printed to the ring buffer anymore. Restore compatiblity by using the old log functions by default and only use the nf_log backend if the user explicitly said so. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27[IPSEC]: Kill post_input hook and do NAT-T in esp_input directlyHerbert Xu
The only reason post_input exists at all is that it gives us the potential to adjust the checksums incrementally in future which we ought to do. However, after thinking about it for a bit we can adjust the checksums without using this post_input stuff at all. The crucial point is that only the inner-most NAT-T SA needs to be considered when adjusting checksums. What's more, the checksum adjustment comes down to a single u32 due to the linearity of IP checksums. We just happen to have a spare u32 lying around in our skb structure :) When ip_summed is set to CHECKSUM_NONE on input, the value of skb->csum is currently unused. All we have to do is to make that the checksum adjustment and voila, there goes all the post_input and decap structures! I've left in the decap data structures for now since it's intricately woven into the sec_path stuff. We can kill them later too. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27[IPSEC] esp: Kill unnecessary block and indentationHerbert Xu
We used to keep sg on the stack which is why the extra block was useful. We've long since stopped doing that so let's kill the block and save some indentation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-23[IPSEC]: Use TOS when doing tunnel lookupsHerbert Xu
We should use the TOS because it's one of the routing keys. It also means that we update the correct routing cache entry when PMTU occurs. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-23[IPV4]: Fix garbage collection of multipath route entriesSuresh Bhogavilli
When garbage collecting route cache entries of multipath routes in rt_garbage_collect(), entries were deleted from the hash bucket 'i' while holding a spin lock on bucket 'k' resulting in a system hang. Delete entries, if any, from bucket 'k' instead. Signed-off-by: Suresh Bhogavilli <sbhogavilli@verisign.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-19[NETFILTER]: Fix outgoing redirects to loopbackPatrick McHardy
When redirecting an outgoing packet to loopback, it keeps the original conntrack reference and information from the outgoing path, which falsely triggers the check for DNAT on input and the dst_entry is released to trigger rerouting. ip_route_input refuses to route the packet because it has a local source address and it is dropped. Look at the packet itself to dermine if it was NATed. Also fix a missing inversion that causes unneccesary xfrm lookups. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-19[NETFILTER]: Fix NAT PMTUD problemsPatrick McHardy
ICMP errors are only SNATed when their source matches the source of the connection they are related to, otherwise the source address is not changed. This creates problems with ICMP frag. required messages originating from a router behind the NAT, if private IPs are used the packet has a good change of getting dropped on the path to its destination. Always NAT ICMP errors similar to the original connection. Based on report by Al Viro. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15[NETFILTER]: nf_conntrack: move registration of __nf_ct_attachYasuyuki Kozakai
Move registration of __nf_ct_attach to nf_conntrack_core to make it usable for IPv6 connection tracking as well. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15[XFRM]: Fix SNAT-related crash in xfrm4_output_finishPatrick McHardy
When a packet matching an IPsec policy is SNATed so it doesn't match any policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish crash because of a NULL pointer dereference. This patch directs these packets to the original output path instead. Since the packets have already passed the POST_ROUTING hook, but need to start at the beginning of the original output path which includes another POST_ROUTING invocation, a flag is added to the IPCB to indicate that the packet was rerouted and doesn't need to pass the POST_ROUTING hook again. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15[NETFILTER]: Fix xfrm lookup after SNATPatrick McHardy
To find out if a packet needs to be handled by IPsec after SNAT, packets are currently rerouted in POST_ROUTING and a new xfrm lookup is done. This breaks SNAT of non-unicast packets to non-local addresses because the packet is routed as incoming packet and no neighbour entry is bound to the dst_entry. In general, it seems to be a bad idea to replace the dst_entry after the packet was already sent to the output routine because its state might not match what's expected. This patch changes the xfrm lookup in POST_ROUTING to re-use the original dst_entry without routing the packet again. This means no policy routing can be used for transport mode transforms (which keep the original route) when packets are SNATed to match the policy, but it looks like the best we can do for now. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13[IPV4] ICMP: Invert default for invalid icmp msgs sysctlDave Jones
isic can trigger these msgs to be spewed at a very high rate. There's already a sysctl to turn them off. Given these messages aren't useful for most people, this patch disables them by default. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>