summaryrefslogtreecommitdiff
path: root/net/netfilter/ipvs
AgeCommit message (Collapse)Author
2011-10-17Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller
2011-10-12IPVS netns shutdown/startup dead-lockHans Schillstrom
ip_vs_mutext is used by both netns shutdown code and startup and both implicit uses sk_lock-AF_INET mutex. cleanup CPU-1 startup CPU-2 ip_vs_dst_event() ip_vs_genl_set_cmd() sk_lock-AF_INET __ip_vs_mutex sk_lock-AF_INET __ip_vs_mutex * DEAD LOCK * A new mutex placed in ip_vs netns struct called sync_mutex is added. Comments from Julian and Simon added. This patch has been running for more than 3 month now and it seems to work. Ver. 3 IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex instead of __ip_vs_mutex as sugested by Julian. Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-10-05netfilter: Use proper rwlock init functionThomas Gleixner
Replace the open coded initialization with the init function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-28Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-2.6
2011-07-22IPVS: Free resources on module removalSimon Horman
This resolves a panic on module removal. Reported-by: Dave Jones <davej@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-06-21ip: introduce ip_is_fragment helper inline functionPaul Gortmaker
There are enough instances of this: iph->frag_off & htons(IP_MF | IP_OFFSET) that a helper function is probably warranted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-20Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c
2011-06-14IPVS: remove unused init and cleanup functions.Hans Schillstrom
After restructuring, there is some unused or empty functions left to be removed. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-06-14IPVS: labels at pos 0Hans Schillstrom
Put goto labels at the beginig of row acording to coding style example. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-06-13IPVS netns exit causes crash in conntrackHans Schillstrom
Quote from Patric Mc Hardy "This looks like nfnetlink.c excited and destroyed the nfnl socket, but ip_vs was still holding a reference to a conntrack. When the conntrack got destroyed it created a ctnetlink event, causing an oops in netlink_has_listeners when trying to use the destroyed nfnetlink socket." If nf_conntrack_netlink is loaded before ip_vs this is not a problem. This patch simply avoids calling ip_vs_conn_drop_conntrack() when netns is dying as suggested by Julian. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-06-13IPVS: rename of netns init and cleanup functions.Hans Schillstrom
Make it more clear what the functions does, on request by Julian. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-06-13ipvs: support more FTP PASV responsesJulian Anastasov
Change the parsing of FTP commands and responses to support skip character. It allows to detect variations in the 227 PASV response. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-06-06ipvs: restore support for iptables SNATJulian Anastasov
Fix the IPVS priority in LOCAL_IN hook, so that SNAT target in POSTROUTING is supported for IPVS traffic as in 2.6.36 where it worked depending on module load order. Before 2.6.37 we used priority 100 in LOCAL_IN to process remote requests. We used the same priority as iptables SNAT and if IPVS handlers are installed before SNAT handlers we supported SNAT in POSTROUTING for the IPVS traffic. If SNAT is installed before IPVS, the netfilter handlers are before IPVS and netfilter checks the NAT table twice for the IPVS requests: once in LOCAL_IN where IPS_SRC_NAT_DONE is set and second time in POSTROUTING where the SNAT rules are ignored because IPS_SRC_NAT_DONE was already set in LOCAL_IN. But in 2.6.37 we changed the IPVS priority for LOCAL_IN with the goal to be unique (101) forgetting the fact that for IPVS traffic we should not walk both LOCAL_IN and POSTROUTING nat tables. So, change the priority for processing remote IPVS requests from 101 to 99, i.e. before NAT_SRC (100) because we prefer to support SNAT in POSTROUTING instead of LOCAL_IN. It also moves the priority for IPVS replies from 99 to 98. Use constants instead of magic numbers at these places. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-27IPVS: bug in ip_vs_ftp, same list heaad used in all netns.Hans Schillstrom
When ip_vs was adapted to netns the ftp application was not adapted in a correct way. However this is a fix to avoid kernel errors. In the long term another solution might be chosen. I.e the ports that the ftp appl, uses should be per netns. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-17Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/vmxnet3/vmxnet3_ethtool.c net/core/dev.c
2011-05-15IPVS: fix netns if reading ip_vs_* procfs entriesHans Schillstrom
Without this patch every access to ip_vs in procfs will increase the netns count i.e. an unbalanced get_net()/put_net(). (ipvsadm commands also use procfs.) The result is you can't exit a netns if reading ip_vs_* procfs entries. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-12ipvs: Remove all remaining references to rt->rt_{src,dst}Julian Anastasov
Remove all remaining references to rt->rt_{src,dst} by using dest->dst_saddr to cache saddr (used for TUN mode). For ICMP in FORWARD hook just restrict the rt_mode for NAT to disable LOCALNODE. All other modes do not allow IP_VS_RT_MODE_RDR, so we should be safe with the ICMP forwarding. Using cp->daddr as replacement for rt_dst is safe for all modes except BYPASS, even when cp->dest is NULL because it is cp->daddr that is used to assign cp->dest for sync-ed connections. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipvs: Eliminate rt->rt_dst usage in __ip_vs_get_out_rt().David S. Miller
We can simply track what destination address is used based upon which code block is taken at the top of the function. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipvs: Use IP_VS_RT_MODE_* instead of magic constants.David S. Miller
[ Add some cases I missed, from Julian Anastasov ] Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6 Conflicts: drivers/net/benet/be_main.c
2011-05-10IPVS: init and cleanup restructuringHans Schillstrom
DESCRIPTION This patch tries to restore the initial init and cleanup sequences that was before namspace patch. Netns also requires action when net devices unregister which has never been implemented. I.e this patch also covers when a device moves into a network namespace, and has to be released. IMPLEMENTATION The number of calls to register_pernet_device have been reduced to one for the ip_vs.ko Schedulers still have their own calls. This patch adds a function __ip_vs_service_cleanup() and an enable flag for the netfilter hooks. The nf hooks will be enabled when the first service is loaded and never disabled again, except when a namespace exit starts. Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> [horms@verge.net.au: minor edit to changelog] Signed-off-by: Simon Horman <horms@verge.net.au>
2011-05-10IPVS: Change of socket usage to enable name space exit.Hans Schillstrom
If the sync daemons run in a name space while it crashes or get killed, there is no way to stop them except for a reboot. When all patches are there, ip_vs_core will handle register_pernet_(), i.e. ip_vs_sync_init() and ip_vs_sync_cleanup() will be removed. Kernel threads should not increment the use count of a socket. By calling sk_change_net() after creating a socket this is avoided. sock_release cant be used intead sk_release_kernel() should be used. Thanks Eric W Biederman for your advices. Signed-off-by: Hans Schillstrom <hans@schillstrom.com> [horms@verge.net.au: minor edit to changelog] Signed-off-by: Simon Horman <horms@verge.net.au>
2011-04-19Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2011-04-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits) net: Add support for SMSC LAN9530, LAN9730 and LAN89530 mlx4_en: Restoring RX buffer pointer in case of failure mlx4: Sensing link type at device initialization ipv4: Fix "Set rt->rt_iif more sanely on output routes." MAINTAINERS: add entry for Xen network backend be2net: Fix suspend/resume operation be2net: Rename some struct members for clarity pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev dsa/mv88e6131: add support for mv88e6085 switch ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2) be2net: Fix a potential crash during shutdown. bna: Fix for handling firmware heartbeat failure can: mcp251x: Allow pass IRQ flags through platform data. smsc911x: fix mac_lock acquision before calling smsc911x_mac_read iwlwifi: accept EEPROM version 0x423 for iwl6000 rt2x00: fix cancelling uninitialized work rtlwifi: Fix some warnings/bugs p54usb: IDs for two new devices wl12xx: fix potential buffer overflow in testmode nvs push zd1211rw: reset rx idle timer from tasklet ...
2011-04-05IPVS: combine consecutive #ifdef CONFIG_PROC_FS blocksSimon Horman
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-04-04IPVS: fix NULL ptr dereference in ip_vs_ctl.c ip_vs_genl_dump_daemons()Hans Schillstrom
ipvsadm -ln --daemon will trigger a Null pointer exception because ip_vs_genl_dump_daemons() uses skb_net() instead of skb_sknet(). To prevent others from NULL ptr a check is made in ip_vs.h skb_net(). Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-21IPVS: Use global mutex in ip_vs_app.cSimon Horman
As part of the work to make IPVS network namespace aware __ip_vs_app_mutex was replaced by a per-namespace lock, ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes. Unfortunately this implementation results in ipvs->app_key residing in non-static storage which at the very least causes a lockdep warning. This patch takes the rather heavy-handed approach of reinstating __ip_vs_app_mutex which will cover access to the ipvs->list_head of all network namespaces. [ 12.610000] IPVS: Creating netns size=2456 id=0 [ 12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP) [ 12.640000] BUG: key ffff880003bbf1a0 not in .data! [ 12.640000] ------------[ cut here ]------------ [ 12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570() [ 12.640000] Hardware name: Bochs [ 12.640000] Pid: 1, comm: swapper Tainted: G W 2.6.38-kexec-06330-g69b7efe-dirty #122 [ 12.650000] Call Trace: [ 12.650000] [<ffffffff8102e685>] warn_slowpath_common+0x75/0xb0 [ 12.650000] [<ffffffff8102e6d5>] warn_slowpath_null+0x15/0x20 [ 12.650000] [<ffffffff8105967b>] lockdep_init_map+0x37b/0x570 [ 12.650000] [<ffffffff8105829d>] ? trace_hardirqs_on+0xd/0x10 [ 12.650000] [<ffffffff81055ad8>] debug_mutex_init+0x38/0x50 [ 12.650000] [<ffffffff8104bc4c>] __mutex_init+0x5c/0x70 [ 12.650000] [<ffffffff81685ee7>] __ip_vs_app_init+0x64/0x86 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff811b1c33>] T.620+0x43/0x170 [ 12.660000] [<ffffffff811b1e9a>] ? register_pernet_subsys+0x1a/0x40 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff811b1db7>] register_pernet_operations+0x57/0xb0 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.670000] [<ffffffff811b1ea9>] register_pernet_subsys+0x29/0x40 [ 12.670000] [<ffffffff81685f19>] ip_vs_app_init+0x10/0x12 [ 12.670000] [<ffffffff81685a87>] ip_vs_init+0x4c/0xff [ 12.670000] [<ffffffff8166562c>] do_one_initcall+0x7a/0x12e [ 12.670000] [<ffffffff8166583e>] kernel_init+0x13e/0x1c2 [ 12.670000] [<ffffffff8128c134>] kernel_thread_helper+0x4/0x10 [ 12.670000] [<ffffffff8128ad40>] ? restore_args+0x0/0x30 [ 12.680000] [<ffffffff81665700>] ? kernel_init+0x0/0x1c2 [ 12.680000] [<ffffffff8128c130>] ? kernel_thread_helper+0x0/0x1global0 Signed-off-by: Simon Horman <horms@verge.net.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21ipvs: fix a typo in __ip_vs_control_init()Eric Dumazet
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-15Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 Conflicts: Documentation/feature-removal-schedule.txt
2011-03-15IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()Simon Horman
Break out the portions of __ip_vs_control_init() and __ip_vs_control_cleanup() where aren't necessary when CONFIG_SYSCTL is undefined. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Conditionally define and use ip_vs_lblc{r}_tableSimon Horman
ip_vs_lblc_table and ip_vs_lblcr_table, and code that uses them are unnecessary when CONFIG_SYSCTL is undefined. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Minimise ip_vs_leave when CONFIG_SYSCTL is undefinedSimon Horman
Much of ip_vs_leave() is unnecessary if CONFIG_SYSCTL is undefined. I tried an approach of breaking the now #ifdef'ed portions out into a separate function. However this appeared to grow the compiled code on x86_64 by about 200 bytes in the case where CONFIG_SYSCTL is defined. So I have gone with the simpler though less elegant #ifdef'ed solution for now. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Conditinally use sysctl_lblc{r}_expirationSimon Horman
In preparation for not including sysctl_lblc{r}_expiration in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Add expire_quiescent_template()Simon Horman
In preparation for not including sysctl_expire_quiescent_template in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Add sysctl_expire_nodest_conn()Simon Horman
In preparation for not including sysctl_expire_nodest_conn in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Add sysctl_sync_ver()Simon Horman
In preparation for not including sysctl_sync_ver in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Add {sysctl_sync_threshold,period}()Simon Horman
In preparation for not including sysctl_sync_threshold in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Add sysctl_nat_icmp_send()Simon Horman
In preparation for not including sysctl_nat_icmp_send in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Add sysctl_snat_reroute()Simon Horman
In preparation for not including sysctl_snat_reroute in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15IPVS: Add ip_vs_route_me_harder()Simon Horman
Add ip_vs_route_me_harder() to avoid repeating the same code twice. Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15ipvs: rename estimator functionsJulian Anastasov
Rename ip_vs_new_estimator to ip_vs_start_estimator and ip_vs_kill_estimator to ip_vs_stop_estimator to better match their logic. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15ipvs: optimize rates readingJulian Anastasov
Move the estimator reading from estimation_timer to user context. ip_vs_read_estimator() will be used to decode the rate values. As the decoded rates are not set by estimation timer there is no need to reset them in ip_vs_zero_stats. There is no need ip_vs_new_estimator() to encode stats to rates, if the destination is in trash both the stats and the rates are inactive. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15ipvs: properly zero stats and ratesJulian Anastasov
Currently, the new percpu counters are not zeroed and the zero commands do not work as expected, we still show the old sum of percpu values. OTOH, we can not reset the percpu counters from user context without causing the incrementing to use old and bogus values. So, as Eric Dumazet suggested fix that by moving all overhead to stats reading in user context. Do not introduce overhead in timer context (estimator) and incrementing (packet handling in softirqs). The new ustats0 field holds the zero point for all counter values, the rates always use 0 as base value as before. When showing the values to user space just give the difference between counters and the base values. The only drawback is that percpu stats are not zeroed, they are accessible only from /proc and are new interface, so it should not be a compatibility problem as long as the sum stats are correct after zeroing. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15ipvs: reorganize tot_statsJulian Anastasov
The global tot_stats contains cpustats field just like the stats for dest and svc, so better use it to simplify the usage in estimation_timer. As tot_stats is registered as estimator we can remove the special ip_vs_read_cpu_stats call for tot_stats. Fix ip_vs_read_cpu_stats to be called under stats lock because it is still used as synchronization between estimation timer and user context (the stats readers). Also, make sure ip_vs_stats_percpu_show reads properly the u64 stats from user context. Signed-off-by: Julian Anastasov <ja@ssi.bg> Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15netfilter:ipvs: use kmemdupShan Wei
The semantic patch that makes this output is available in scripts/coccinelle/api/memdup.cocci. More information about semantic patching is available at http://coccinelle.lip6.fr/ Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15ipvs: remove _bh from percpu stats readingJulian Anastasov
ip_vs_read_cpu_stats is called only from timer, so no need for _bh locks. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15ipvs: avoid lookup for fwmark 0Julian Anastasov
Restore the previous behaviour to lookup for fwmark service only when fwmark is non-null. This saves only CPU. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-12ipv6: Convert to use flowi6 where applicable.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12ipv4: Use flowi4 in public route lookup interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>