summaryrefslogtreecommitdiff
path: root/include/net/inetpeer.h
AgeCommit message (Collapse)Author
2011-12-09inet: add a redirect generation id in inetpeerEric Dumazet
[ Upstream commit de68dca1816660b0d3ac89fa59ffb410007a143f ] Now inetpeer is the place where we cache redirect information for ipv4 destinations, we must be able to invalidate informations when a route is added/removed on host. As inetpeer is not yet namespace aware, this patch adds a shared redirect_genid, and a per inetpeer redirect_genid. This might be changed later if inetpeer becomes ns aware. Cache information for one inerpeer is valid as long as its redirect_genid has the same value than global redirect_genid. Reported-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Tested-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-26atomic: use <linux/atomic.h>Arun Sharma
This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-21ipv6: make fragment identifications less predictableEric Dumazet
IPv6 fragment identification generation is way beyond what we use for IPv4 : It uses a single generator. Its not scalable and allows DOS attacks. Now inetpeer is IPv6 aware, we can use it to provide a more secure and scalable frag ident generator (per destination, instead of system wide) This patch : 1) defines a new secure_ipv6_id() helper 2) extends inet_getid() to provide 32bit results 3) extends ipv6_select_ident() with a new dest parameter Reported-by: Fernando Gont <fernando@gont.com.ar> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-08inetpeer: lower false sharing effectEric Dumazet
Profiles show false sharing in addr_compare() because refcnt/dtime changes dirty the first inet_peer cache line, where are lying the keys used at lookup time. If many cpus are calling inet_getpeer() and inet_putpeer(), or need frag ids, addr_compare() is in 2nd position in "perf top". Before patch, my udpflood bench (16 threads) on my 2x4x2 machine : 5784.00 9.7% csum_partial_copy_generic [kernel] 3356.00 5.6% addr_compare [kernel] 2638.00 4.4% fib_table_lookup [kernel] 2625.00 4.4% ip_fragment [kernel] 1934.00 3.2% neigh_lookup [kernel] 1617.00 2.7% udp_sendmsg [kernel] 1608.00 2.7% __ip_route_output_key [kernel] 1480.00 2.5% __ip_append_data [kernel] 1396.00 2.3% kfree [kernel] 1195.00 2.0% kmem_cache_free [kernel] 1157.00 1.9% inet_getpeer [kernel] 1121.00 1.9% neigh_resolve_output [kernel] 1012.00 1.7% dev_queue_xmit [kernel] # time ./udpflood.sh real 0m44.511s user 0m20.020s sys 11m22.780s # time ./udpflood.sh real 0m44.099s user 0m20.140s sys 11m15.870s After patch, no more addr_compare() in profiles : 4171.00 10.7% csum_partial_copy_generic [kernel] 1787.00 4.6% fib_table_lookup [kernel] 1756.00 4.5% ip_fragment [kernel] 1234.00 3.2% udp_sendmsg [kernel] 1191.00 3.0% neigh_lookup [kernel] 1118.00 2.9% __ip_append_data [kernel] 1022.00 2.6% kfree [kernel] 993.00 2.5% __ip_route_output_key [kernel] 841.00 2.2% neigh_resolve_output [kernel] 816.00 2.1% kmem_cache_free [kernel] 658.00 1.7% ia32_sysenter_target [kernel] 632.00 1.6% kmem_cache_alloc_node [kernel] # time ./udpflood.sh real 0m41.587s user 0m19.190s sys 10m36.370s # time ./udpflood.sh real 0m41.486s user 0m19.290s sys 10m33.650s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-08inetpeer: remove unused listEric Dumazet
Andi Kleen and Tim Chen reported huge contention on inetpeer unused_peers.lock, on memcached workload on a 40 core machine, with disabled route cache. It appears we constantly flip peers refcnt between 0 and 1 values, and we must insert/remove peers from unused_peers.list, holding a contended spinlock. Remove this list completely and perform a garbage collection on-the-fly, at lookup time, using the expired nodes we met during the tree traversal. This removes a lot of code, makes locking more standard, and obsoletes two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also removes two pointers in inet_peer structure. There is still a false sharing effect because refcnt is in first cache line of object [were the links and keys used by lookups are located], we might move it at the end of inet_peer structure to let this first cache line mostly read by cpus. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <andi@firstfloor.org> CC: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-22inet: constify ip headers and in6_addrEric Dumazet
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-10inetpeer: Add redirect and PMTU discovery cached info.David S. Miller
Validity of the cached PMTU information is indicated by it's expiration value being non-zero, just as per dst->expires. The scheme we will use is that we will remember the pre-ICMP value held in the metrics or route entry, and then at expiration time we will restore that value. In this way PMTU expiration does not kill off the cached route as is done currently. Redirect information is permanent, or at least until another redirect is received. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-10inetpeer: Abstract address representation further.David S. Miller
Future changes will add caching information, and some of these new elements will be addresses. Since the family is implicit via the ->daddr.family member, replicating the family in ever address we store is entirely redundant. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-04inetpeer: Move ICMP rate limiting state into inet_peer entries.David S. Miller
Like metrics, the ICMP rate limiting bits are cached state about a destination. So move it into the inet_peer entries. If an inet_peer cannot be bound (the reason is memory allocation failure or similar), the policy is to allow. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27inetpeer: Mark metrics as "new" in fresh inetpeer entries.David S. Miller
Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically, all ones) on fresh inetpeer entries. This way code can determine if default metrics have been loaded in from a routing table entry already. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27inetpeer: Add metrics storage to inetpeer entries.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01inetpeer: Fix incorrect comment about inetpeer struct size.David S. Miller
Now with ipv6 support it is no longer less than 64 bytes. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01inetpeer: Kill use of inet_peer_address_t typedef.David S. Miller
They are verboten these days. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30inetpeer: Add inet_getpeer_v6()David S. Miller
Now that all of the infrastructure is in place, we can add the ipv6 shorthand for peer creation. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30inetpeer: Make inet_getpeer() take an inet_peer_adress_t pointer.David S. Miller
And make an inet_getpeer_v4() helper, update callers. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30inetpeer: Introduce inet_peer_address_t.David S. Miller
Currently only the v4 aspect is used, but this will change. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27inetpeer: __rcu annotationsEric Dumazet
Adds __rcu annotations to inetpeer (struct inet_peer)->avl_left (struct inet_peer)->avl_right This is a tedious cleanup, but removes one smp_wmb() from link_to_pool() since we now use more self documenting rcu_assign_pointer(). Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in all cases we dont need a memory barrier. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-16inetpeer: restore small inet_peer structuresEric Dumazet
Addition of rcu_head to struct inet_peer added 16bytes on 64bit arches. Thats a bit unfortunate, since old size was exactly 64 bytes. This can be solved, using an union between this rcu_head an four fields, that are normally used only when a refcount is taken on inet_peer. rcu_head is used only when refcnt=-1, right before structure freeing. Add a inet_peer_refcheck() function to check this assertion for a while. We can bring back SLAB_HWCACHE_ALIGN qualifier in kmem cache creation. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15inetpeer: RCU conversionEric Dumazet
inetpeer currently uses an AVL tree protected by an rwlock. It's possible to make most lookups use RCU 1) Add a struct rcu_head to struct inet_peer 2) add a lookup_rcu_bh() helper to perform lockless and opportunistic lookup. This is a normal function, not a macro like lookup(). 3) Add a limit to number of links followed by lookup_rcu_bh(). This is needed in case we fall in a loop. 4) add an smp_wmb() in link_to_pool() right before node insert. 5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take last reference to an inet_peer, since lockless readers could increase refcount, even while we hold peers.lock. 6) Delay struct inet_peer freeing after rcu grace period so that lookup_rcu_bh() cannot crash. 7) inet_getpeer() first attempts lockless lookup. Note this lookup can fail even if target is in AVL tree, but a concurrent writer can let tree in a non correct form. If this attemps fails, lock is taken a regular lookup is performed again. 8) convert peers.lock from rwlock to a spinlock 9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 -> 128 bytes) In a future patch, this is probably possible to revert this part, if rcu field is put in an union to share space with rid, ip_id_count, tcp_ts & tcp_ts_stamp. These fields being manipulated only with refcnt > 0. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13inetpeer: Optimize inet_getid()Eric Dumazet
While investigating for network latencies, I found inet_getid() was a contention point for some workloads, as inet_peer_idlock is shared by all inet_getid() users regardless of peers. One way to fix this is to make ip_id_count an atomic_t instead of __u16, and use atomic_add_return(). In order to keep sizeof(struct inet_peer) = 64 on 64bit arches tcp_ts_stamp is also converted to __u32 instead of "unsigned long". Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04net: cleanup include/netEric Dumazet
This cleanup patch puts struct/union/enum opening braces, in first line to ease grep games. struct something { becomes : struct something { Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11net: remove CVS keywordsAdrian Bunk
This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12[INET]: Use list_head-s in inetpeer.cPavel Emelyanov
The inetpeer.c tracks the LRU list of inet_perr-s, but makes it by hands. Use the list_head-s for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-20[IPV4] inet_peer: Group together avl_left, avl_right, v4daddr to speedup ↵Eric Dumazet
lookups on some CPUS Lot of routers/embedded devices still use CPUS with 16/32 bytes cache lines. (486, Pentium, ... PIII) It makes sense to group together fields used at lookup time so they fit in one cache line. This reduce cache footprint and speedup lookups. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-15[NET]: reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()Eric Dumazet
1) shrink struct inet_peer on 64 bits platforms.
2006-09-28[IPV4]: inetpeer annotationsAl Viro
This one is interesting - we use net-endian value as search key, but order the tree by *host-endian* comparisons of keys. OK since we only care about lookups. Annotated inet_getpeer() and friends. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[IPV4]: Safer reassemblyHerbert Xu
Another spin of Herbert Xu's "safer ip reassembly" patch for 2.6.16. (The original patch is here: http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2 and my only contribution is to have tested it.) This patch (optionally) does additional checks before accepting IP fragments, which can greatly reduce the possibility of reassembling fragments which originated from different IP datagrams. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-16Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!