summaryrefslogtreecommitdiff
path: root/net/dccp/ipv4.c
AgeCommit message (Collapse)Author
2006-10-24[DCCP]: Update documentation references.Gerrit Renker
Updates the references to spec documents throughout the code, taking into account that * the DCCP, CCID 2, and CCID 3 drafts all became RFCs in March this year * RFC 1063 was obsoleted by RFC 1191 * draft-ietf-tcpimpl-pmtud-0x.txt was published as an Informational RFC, RFC 2923 on 2000-09-22. All references verified. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-21[DCCP]: Fix Oops in DCCPv6Gerrit Renker
I think I got the cause for the Oops observed in http://www.mail-archive.com/dccp@vger.kernel.org/msg00578.html The problem is always with applications listening on PF_INET6 sockets. Apart from the mentioned oops, I observed another one one, triggered at irregular intervals via timer interrupt: run_timer_softirq -> dccp_keepalive_timer -> inet_csk_reqsk_queue_prune -> reqsk_free -> dccp_v6_reqsk_destructor The latter function is the problem and is also the last function to be called in said kernel panic. In any case, there is a real problem with allocating the right request_sock which is what this patch tackles. It fixes the following problem: - application listens on PF_INET6 - DCCPv4 packet comes in, is handed over to dccp_v4_do_rcv, from there to dccp_v4_conn_request Now: socket is PF_INET6, packet is IPv4. The following code then furnishes the connection with IPv6 - request_sock operations: req = reqsk_alloc(sk->sk_prot->rsk_prot); The first problem is that all further incoming packets will get a Reset since the connection can not be looked up. The second problem is worse: --> reqsk_alloc is called instead of inet6_reqsk_alloc --> consequently inet6_rsk_offset is never set (dangling pointer) --> the request_sock_ops are nevertheless still dccp6_request_ops --> destructor is called via reqsk_free --> dccp_v6_reqsk_destructor tries to free random memory location (inet6_rsk_offset not set) --> panic I have tested this for a while, DCCP sockets are now handled correctly in all three scenarios (v4/v6 only/v4-mapped). Commiter note: I've added the dccp_request_sock_ops forward declaration to keep the tree building and to reduce the size of the patch for 2.6.19, later I'll move the functions to the top of the affected source code to match what we have in the TCP counterpart, where this problem hasn't existed in the first place, dumb me not to have done the same thing on DCCP land 8) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-10-11[NET]: Use typesafe inet_twsk() inline function instead of cast.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28[IPV4]: ip_route_connect() ipv4 address arguments annotatedAl Viro
annotated address arguments (port number left alone for now); ditto for inferred net-endian variables in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-24[DCCP]: Allow default/fallback service code.Gerrit Renker
This has been discussed on dccp@vger and removes the necessity for applications to supply service codes in each and every case. If an application does not want to provide a service code, that's fine, it will be given 0. Otherwise, service codes can be set via socket options as before. This patch has been tested using various client/server configurations (including listening on multiple service codes). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-09-22[IPV4]: Use network-order dport for all visible inet_lookup_*Herbert Xu
Right now most inet_lookup_* functions take a host-order hnum instead of a network-order dport because that's how it is represented internally. This means that users of these functions have to be careful about using the right byte-order. To add more confusion, inet_lookup takes a network-order dport unlike all other functions. So this patch changes all visible inet_lookup functions to take a dport and move all dport->hnum conversion inside them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[MLSXFRM]: Auto-labeling of child socketsVenkat Yekkirala
This automatically labels the TCP, Unix stream, and dccp child sockets as well as openreqs to be at the same MLS level as the peer. This will result in the selection of appropriately labeled IPSec Security Associations. This also uses the sock's sid (as opposed to the isec sid) in SELinux enforcement of secmark in rcv_skb and postroute_last hooks. Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[MLSXFRM]: Add flow labelingVenkat Yekkirala
This labels the flows that could utilize IPSec xfrms at the points the flows are defined so that IPSec policy and SAs at the right label can be used. The following protos are currently not handled, but they should continue to be able to use single-labeled IPSec like they currently do. ipmr ip_gre ipip igmp sit sctp ip6_tunnel (IPv6 over IPv6 tunnel device) decnet Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-24[DCCP]: Fix default sequence window sizeIan McDonald
When using the default sequence window size (100) I got the following in my logs: Jun 22 14:24:09 localhost kernel: [ 1492.114775] DCCP: Step 6 failed for DATA packet, (LSWL(6279674225) <= P.seqno(6279674749) <= S.SWH(6279674324)) and (P.ackno doesn't exist or LAWL(18798206530) <= P.ackno(1125899906842620) <= S.AWH(18798206548), sending SYNC... Jun 22 14:24:09 localhost kernel: [ 1492.115147] DCCP: Step 6 failed for DATA packet, (LSWL(6279674225) <= P.seqno(6279674750) <= S.SWH(6279674324)) and (P.ackno doesn't exist or LAWL(18798206530) <= P.ackno(1125899906842620) <= S.AWH(18798206549), sending SYNC... I went to alter the default sysctl and it didn't take for new sockets. Below patch fixes this. I think the default is too low but it is what the DCCP spec specifies. As a side effect of this my rx speed using iperf goes from about 2.8 Mbits/sec to 3.5. This is still far too slow but it is a step in the right direction. Compile tested only for IPv6 but not particularly complex change. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-11[DCCP]: Fix leak in net/dccp/ipv4.cEric Sesterhenn
we dont free req if we cant parse the options. This fixes coverity bug id #1046 Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: Identation & other cleanups related to compat_[gs]etsockopt csetArnaldo Carvalho de Melo
No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs to just after the function exported, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: {get|set}sockopt compatibility layerDmitry Mishin
This patch extends {get|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: ditch dccp_v[46]_ctl_send_ackArnaldo Carvalho de Melo
Merging it with its only user: dccp_v[46]_reqsk_send_ack. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Use sk->sk_prot->max_header consistently for non-data packetsArnaldo Carvalho de Melo
Using this also provides opportunities for introducing inet_csk_alloc_skb that would call alloc_skb, account it to the sock and skb_reserve(max_header), but I'll leave this for later, for now using sk_prot->max_header consistently is enough. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[ICSK]: Introduce inet_csk_ctl_sock_createArnaldo Carvalho de Melo
Consolidating open coded sequences in tcp and dccp, v4 and v6. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP] ipv6: Add missing ipv6 control socketArnaldo Carvalho de Melo
I guess I forgot to add it, nah, now it just works: 18:04:33.274066 IP6 ::1.1476 > ::1.5001: request (service=0) 18:04:33.334482 IP6 ::1.5001 > ::1.1476: reset (code=bad_service_code) Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both IPv6 and IPv4, so I'll come up with another way for freeing the control sockets in upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP] ipv4: make struct dccp_v4_prot staticAdrian Bunk
There's no reason for struct dccp_v4_prot being global. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Move the IPv4 specific bits from proto.c to ipv4.cArnaldo Carvalho de Melo
With this patch in place we can break down the complexity by better compartmentalizing the code that is common to ipv6 and ipv4. Now we have these modules: Module Size Used by dccp_diag 1344 0 inet_diag 9448 1 dccp_diag dccp_ccid3 15856 0 dccp_tfrc_lib 12320 1 dccp_ccid3 dccp_ccid2 5764 0 dccp_ipv4 16996 2 dccp 48208 4 dccp_diag,dccp_ccid3,dccp_ccid2,dccp_ipv4 dccp_ipv6 still requires dccp_ipv4 due to dccp_ipv6_mapped, that is the next target to work on the "hey, ipv4 is legacy, I only want ipv6 dude!" direction. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Move dccp_hashinfo from ipv4.c to the coreArnaldo Carvalho de Melo
As it is used by both ipv4 and ipv6. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Dont use dccp_v4_checksum in dccp_make_responseArnaldo Carvalho de Melo
dccp_make_response is shared by ipv4/6 and the ipv6 code was recalculating the checksum, not good, so move the dccp_v4_checksum call to dccp_v4_send_response. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Move dccp_[un]hash from ipv4.c to the coreArnaldo Carvalho de Melo
As this is used by both ipv4 and ipv6 and is not ipv4 specific. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Move dccp_v4_{init,destroy}_sock to the coreArnaldo Carvalho de Melo
Removing one more ipv6 uses ipv4 stuff case in dccp land. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Generalize dccp_v4_send_resetArnaldo Carvalho de Melo
Renaming it to dccp_send_reset and moving it from the ipv4 specific code to the core dccp code. This fixes some bugs in IPV6 where timers would send v4 resets, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Call dccp_feat_init more early in dccp_v4_init_sockArnaldo Carvalho de Melo
So that dccp_feat_clean doesn't get confused with uninitialized list_heads. Noticed when testing with no ccid kernel modules. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: sparse endianness annotationsAndrea Bittau
This also fixes the layout of dccp_hdr short sequence numbers, problem was not fatal now as we only support long (48 bits) sequence numbers. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP] CCID: Improve CCID infrastructureArnaldo Carvalho de Melo
1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit} does and anynways neither ccid2 nor ccid3 were using it. 2. Rename struct ccid to struct ccid_operations and introduce struct ccid with a pointer to ccid_operations and rigth after it the rx or tx private state. 3. Remove the pointer to the state of the half connections from struct dccp_sock, now its derived thru ccid_priv() from the ccid pointer. Now we also can implement the setsockopt for changing the CCID easily as no ccid init routines can affect struct dccp_sock in any way that prevents other CCIDs from working if a CCID switch operation is asked by apps. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Initial feature negotiation implementationAndrea Bittau
Still needs more work, but boots and doesn't crashes, even does some negotiation! 18:38:52.174934 127.0.0.1.43458 > 127.0.0.1.5001: request <change_l ack_ratio 2, change_r ccid 2, change_l ccid 2> 18:38:52.218526 127.0.0.1.5001 > 127.0.0.1.43458: response <nop, nop, change_l ack_ratio 2, confirm_r ccid 2 2, confirm_l ccid 2 2, confirm_r ack_ratio 2> 18:38:52.185398 127.0.0.1.43458 > 127.0.0.1.5001: <nop, confirm_r ack_ratio 2, ack_vector0 0x00, elapsed_time 212> :-) Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP] CCID2: Initial CCID2 (TCP-Like) implementationAndrea Bittau
Original work by Andrea Bittau, Arnaldo Melo cleaned up and fixed several issues on the merge process. For now CCID2 was turned the default for all SOCK_DCCP connections, but this will be remedied soon with the merge of the feature negotiation code. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP]: Don't alloc ack vector for the control sockArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[DCCP] ackvec: Ditch dccpav_buf_lenArnaldo Carvalho de Melo
Simplifying the code a bit as we're always using DCCP_MAX_ACKVEC_LEN. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-31[IPV4]: Always set fl.proto in ip_route_newportsPatrick McHardy
ip_route_newports uses the struct flowi from the struct rtable returned by ip_route_connect for the new route lookup and just replaces the port numbers if they have changed. If an IPsec policy exists which doesn't match port 0 the struct flowi won't have the proto field set and no xfrm lookup is done for the changed ports. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[NETFILTER]: Handle NAT in IPsec policy checksPatrick McHardy
Handle NAT of decapsulated IPsec packets by reconstructing the struct flowi of the original packet from the conntrack information for IPsec policy checks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[NETFILTER]: Keep conntrack reference until IPsec policy checks are donePatrick McHardy
Keep the conntrack reference until policy checks have been performed for IPsec NAT support. The reference needs to be dropped before a packet is queued to avoid having the conntrack module unloadable. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.hArnaldo Carvalho de Melo
To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[SOCK]: Introduce sk_receive_skbArnaldo Carvalho de Melo
Its common enough to to justify that, TCP still can't use it as it has the prequeueing stuff, still to be made generic in the not so distant future :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[IP_SOCKGLUE]: Remove most of the tcp specific callsArnaldo Carvalho de Melo
As DCCP needs to be called in the same spots. Now we have a member in inet_sock (is_icsk), set at sock creation time from struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and DCCP) to see if a struct sock instance is a inet_connection_sock for places like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if sk_type was SOCK_STREAM, that is insufficient because we now use the same code for DCCP, that has sk_type SOCK_DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[INET]: Generalise tcp_v4_hash_connectArnaldo Carvalho de Melo
Renaming it to inet_hash_connect, making it possible to ditch dccp_v4_hash_connect and share the same code with TCP instead. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[TWSK]: Introduce struct timewait_sock_opsArnaldo Carvalho de Melo
So that we can share several timewait sockets related functions and make the timewait mini sockets infrastructure closer to the request mini sockets one. Next changesets will take advantage of this, moving more code out of TCP and DCCP v4 and v6 to common infrastructure. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[DCCP]: Use reqsk_free in dccp_v4_conn_requestArnaldo Carvalho de Melo
Now we have the destructor (dccp_v4_reqsk_destructor) in our request_sock_ops vtable. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[DCCP]: Prepare the AF agnostic core for the introduction of DCCPv6Arnaldo Carvalho de Melo
Basically exports a similar set of functions as the one exported by the non-AF specific TCP code. In the process moved some non-AF specific code from dccp_v4_connect to dccp_connect_init and moved the checksum verification from dccp_invalid_packet to dccp_v4_rcv, so as to use it in dccp_v6_rcv too. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[DCCP]: Just rename dccp_v4_prot to dccp_protArnaldo Carvalho de Melo
To match TCP equivalent. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[DCCP]: Introduce dccp_ipv4_af_opsArnaldo Carvalho de Melo
And make the core DCCP code AF agnostic, just like TCP, now its time to work on net/dccp/ipv6.c, we are close to the end! Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03[IPV6]: Reuse inet_csk_get_port in tcp_v6_get_portArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-21[DCCP]: Comment typoIan McDonald
I hope to actually change this behaviour shortly but this will help anybody grepping code at present. Signed-off-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08[NET]: kfree cleanupJesper Juhl
From: Jesper Juhl <jesper.juhl@gmail.com> This is the net/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in net/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Acked-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-11-05[TCP/DCCP]: Randomize port selectionStephen Hemminger
This patch randomizes the port selected on bind() for connections to help with possible security attacks. It should also be faster in most cases because there is no need for a global lock. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-20[DCCP]: Clear the IPCB areaHerbert Xu
Turns out the problem has nothing to do with use-after-free or double-free. It's just that we're not clearing the CB area and DCCP unlike TCP uses a CB format that's incompatible with IP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-03[INET]: speedup inet (tcp/dccp) lookupsEric Dumazet
Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! */ } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto *skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18[DCCP]: Don't use necessarily the same CCID for tx and rxArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>