summaryrefslogtreecommitdiff
path: root/include/linux/netfilter.h
AgeCommit message (Collapse)Author
2015-06-18netfilter: don't pull include/linux/netfilter.h from netns headersPablo Neira Ayuso
This pulls the full hook netfilter definitions from all those that include net_namespace.h. Instead let's just include the bare minimum required in the new linux/netfilter_defs.h file, and use it from the netfilter netns header files. I also needed to include in.h and in6.h from linux/netfilter.h otherwise we hit this compilation error: In file included from include/linux/netfilter_defs.h:4:0, from include/net/netns/netfilter.h:4, from include/net/net_namespace.h:22, from include/linux/netdevice.h:43, from net/netfilter/nfnetlink_queue_core.c:23: include/uapi/linux/netfilter.h:76:17: error: field ‘in’ has incomplete type struct in_addr in; And also explicit include linux/netfilter.h in several spots. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-05-14netfilter: add netfilter ingress hook after handle_ing() under unique static keyPablo Neira
This patch adds the Netfilter ingress hook just after the existing tc ingress hook, that seems to be the consensus solution for this. Note that the Netfilter hook resides under the global static key that enables ingress filtering. Nonetheless, Netfilter still also has its own static key for minimal impact on the existing handle_ing(). * Without this patch: Result: OK: 6216490(c6216338+d152) usec, 100000000 (60byte,0frags) 16086246pps 7721Mb/sec (7721398080bps) errors: 100000000 42.46% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 25.92% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.81% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.62% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.70% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.34% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.44% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch: Result: OK: 6214833(c6214731+d101) usec, 100000000 (60byte,0frags) 16090536pps 7723Mb/sec (7723457280bps) errors: 100000000 41.23% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 26.57% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.72% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.55% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.78% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.06% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.43% kpktgend_0 [kernel.kallsyms] [k] __build_skb * Without this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9269001(c9268821+d179) usec, 100000000 (60byte,0frags) 10788648pps 5178Mb/sec (5178551040bps) errors: 100000000 40.99% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.50% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.77% kpktgend_0 [cls_u32] [k] u32_classify 5.62% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.18% kpktgend_0 [pktgen] [k] pktgen_thread_worker 3.23% kpktgend_0 [kernel.kallsyms] [k] tc_classify 2.97% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 1.83% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.50% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 0.99% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9308218(c9308091+d126) usec, 100000000 (60byte,0frags) 10743194pps 5156Mb/sec (5156733120bps) errors: 100000000 42.01% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.78% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.70% kpktgend_0 [cls_u32] [k] u32_classify 5.46% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.16% kpktgend_0 [pktgen] [k] pktgen_thread_worker 2.98% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.84% kpktgend_0 [kernel.kallsyms] [k] tc_classify 1.96% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.57% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk Note that the results are very similar before and after. I can see gcc gets the code under the ingress static key out of the hot path. Then, on that cold branch, it generates the code to accomodate the netfilter ingress static key. My explanation for this is that this reduces the pressure on the instruction cache for non-users as the new code is out of the hot path, and it comes with minimal impact for tc ingress users. Using gcc version 4.8.4 on: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 [...] L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 8192K Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: add nf_hook_list_active()Pablo Neira
In preparation to have netfilter ingress per-device hook list. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: add hook list to nf_hook_statePablo Neira
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netfilter: cleanup struct nf_hook_ops indentationPablo Neira
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07netfilter: Pass socket pointer down through okfn().David Miller
On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07netfilter: Add socket pointer to nf_hook_state.David Miller
It is currently always set to NULL, but nf_queue is adjusted to be prepared for it being set to a real socket by taking and releasing a reference to that socket when necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07netfilter: Add nf_hook_state initializer function.David Miller
This way we can consolidate where we setup new nf_hook_state objects, to make sure the entire thing is initialized. The only other place an nf_hook_object is instantiated is nf_queue, wherein a structure copy is used. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Make nf_hookfn use nf_hook_state.David S. Miller
Pass the nf_hook_state all the way down into the hook functions themselves. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-04netfilter: Create and use nf_hook_state.David S. Miller
Instead of passing a large number of arguments down into the nf_hook() entry points, create a structure which carries this state down through the hook processing layers. This makes is so that if we want to change the types or signatures of any of these pieces of state, there are less places that need to be changed. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25netfilter: HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABELZhouyi Zhou
Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure that the toolchain has the required support in addition to CONFIG_JUMP_LABEL being set. Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: add nftablesPatrick McHardy
This patch adds nftables which is the intended successor of iptables. This packet filtering framework reuses the existing netfilter hooks, the connection tracking system, the NAT subsystem, the transparent proxying engine, the logging infrastructure and the userspace packet queueing facilities. In a nutshell, nftables provides a pseudo-state machine with 4 general purpose registers of 128 bits and 1 specific purpose register to store verdicts. This pseudo-machine comes with an extensible instruction set, a.k.a. "expressions" in the nftables jargon. The expressions included in this patch provide the basic functionality, they are: * bitwise: to perform bitwise operations. * byteorder: to change from host/network endianess. * cmp: to compare data with the content of the registers. * counter: to enable counters on rules. * ct: to store conntrack keys into register. * exthdr: to match IPv6 extension headers. * immediate: to load data into registers. * limit: to limit matching based on packet rate. * log: to log packets. * meta: to match metainformation that usually comes with the skbuff. * nat: to perform Network Address Translation. * payload: to fetch data from the packet payload and store it into registers. * reject (IPv4 only): to explicitly close connection, eg. TCP RST. Using this instruction-set, the userspace utility 'nft' can transform the rules expressed in human-readable text representation (using a new syntax, inspired by tcpdump) to nftables bytecode. nftables also inherits the table, chain and rule objects from iptables, but in a more configurable way, and it also includes the original datatype-agnostic set infrastructure with mapping support. This set infrastructure is enhanced in the follow up patch (netfilter: nf_tables: add netlink set API). This patch includes the following components: * the netlink API: net/netfilter/nf_tables_api.c and include/uapi/netfilter/nf_tables.h * the packet filter core: net/netfilter/nf_tables_core.c * the expressions (described above): net/netfilter/nft_*.c * the filter tables: arp, IPv4, IPv6 and bridge: net/ipv4/netfilter/nf_tables_ipv4.c net/ipv6/netfilter/nf_tables_ipv6.c net/ipv4/netfilter/nf_tables_arp.c net/bridge/netfilter/nf_tables_bridge.c * the NAT table (IPv4 only): net/ipv4/netfilter/nf_table_nat_ipv4.c * the route table (similar to mangle): net/ipv4/netfilter/nf_table_route_ipv4.c net/ipv6/netfilter/nf_table_route_ipv6.c * internal definitions under: include/net/netfilter/nf_tables.h include/net/netfilter/nf_tables_core.h * It also includes an skeleton expression: net/netfilter/nft_expr_template.c and the preliminary implementation of the meta target net/netfilter/nft_meta_target.c It also includes a change in struct nf_hook_ops to add a new pointer to store private data to the hook, that is used to store the rule list per chain. This patch is based on the patch from Patrick McHardy, plus merged accumulated cleanups, fixes and small enhancements to the nftables code that has been done since 2009, which are: From Patrick McHardy: * nf_tables: adjust netlink handler function signatures * nf_tables: only retry table lookup after successful table module load * nf_tables: fix event notification echo and avoid unnecessary messages * nft_ct: add l3proto support * nf_tables: pass expression context to nft_validate_data_load() * nf_tables: remove redundant definition * nft_ct: fix maxattr initialization * nf_tables: fix invalid event type in nf_tables_getrule() * nf_tables: simplify nft_data_init() usage * nf_tables: build in more core modules * nf_tables: fix double lookup expression unregistation * nf_tables: move expression initialization to nf_tables_core.c * nf_tables: build in payload module * nf_tables: use NFPROTO constants * nf_tables: rename pid variables to portid * nf_tables: save 48 bits per rule * nf_tables: introduce chain rename * nf_tables: check for duplicate names on chain rename * nf_tables: remove ability to specify handles for new rules * nf_tables: return error for rule change request * nf_tables: return error for NLM_F_REPLACE without rule handle * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification * nf_tables: fix NLM_F_MULTI usage in netlink notifications * nf_tables: include NLM_F_APPEND in rule dumps From Pablo Neira Ayuso: * nf_tables: fix stack overflow in nf_tables_newrule * nf_tables: nft_ct: fix compilation warning * nf_tables: nft_ct: fix crash with invalid packets * nft_log: group and qthreshold are 2^16 * nf_tables: nft_meta: fix socket uid,gid handling * nft_counter: allow to restore counters * nf_tables: fix module autoload * nf_tables: allow to remove all rules placed in one chain * nf_tables: use 64-bits rule handle instead of 16-bits * nf_tables: fix chain after rule deletion * nf_tables: improve deletion performance * nf_tables: add missing code in route chain type * nf_tables: rise maximum number of expressions from 12 to 128 * nf_tables: don't delete table if in use * nf_tables: fix basechain release From Tomasz Bursztyka: * nf_tables: Add support for changing users chain's name * nf_tables: Change chain's name to be fixed sized * nf_tables: Add support for replacing a rule by another one * nf_tables: Update uapi nftables netlink header documentation From Florian Westphal: * nft_log: group is u16, snaplen u32 From Phil Oester: * nf_tables: operational limit match Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: pass hook ops to hookfnPatrick McHardy
Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-26netfilter: Remove extern from function prototypesJoe Perches
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com>
2013-08-28netfilter: nf_conntrack: make sequence number adjustments usuable without NATPatrick McHardy
Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-13netfilter: nfnetlink_queue: allow to attach expectations to conntracksPablo Neira Ayuso
This patch adds the capability to attach expectations via nfnetlink_queue. This is required by conntrack helpers that trigger expectations based on the first packet seen like the TFTP and the DHCPv6 user-space helpers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_nat: change sequence number adjustments to 32 bitsPatrick McHardy
Using 16 bits is too small, when many adjustments happen the offsets might overflow and break the connection. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_conntrack: constify sk_buff argument to nf_ct_attach()Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23netfilter: don't panic on error while walking through the init pathPablo Neira Ayuso
Don't panic if we hit an error while adding the nf_log or pernet netfilter support, just bail out. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-04-05netfilter: remove unneeded variable proc_net_netfilterPablo Neira Ayuso
Now that this supports net namespace for nflog and nfqueue, we can remove the global proc_net_netfilter which has no clients anymore. Based on patch from Gao feng. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-13UAPI: (Scripted) Disintegrate include/linuxDavid Howells
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-08-30netfilter: add protocol independent NAT corePatrick McHardy
Convert the IPv4 NAT implementation to a protocol independent core and address family specific modules. Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-06-22netfilter: nfnetlink_queue: fix compilation with CONFIG_NF_NAT=m and ↵Pablo Neira Ayuso
CONFIG_NF_CT_NETLINK=y LD init/built-in.o net/built-in.o:(.data+0x4408): undefined reference to `nf_nat_tcp_seq_adjust' make: *** [vmlinux] Error 1 This patch adds a new pointer hook (nfq_ct_nat_hook) similar to other existing in Netfilter to solve our complicated configuration dependencies. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-20netfilter: nfq_ct_hook needs __rcu and __read_mostlyPablo Neira Ayuso
This removes some sparse warnings. Reported-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16netfilter: nfnetlink_queue: add NAT TCP sequence adjustment if packet mangledPablo Neira Ayuso
User-space programs that receive traffic via NFQUEUE may mangle packets. If NAT is enabled, this usually puzzles sequence tracking, leading to traffic disruptions. With this patch, nfnl_queue will make the corresponding NAT TCP sequence adjustment if: 1) The packet has been mangled, 2) the NFQA_CFG_F_CONNTRACK flag has been set, and 3) NAT is detected. There are some records on the Internet complaning about this issue: http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables By now, we only support TCP since we have no helpers for DCCP or SCTP. Better to add this if we ever have some helper over those layer 4 protocols. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-16netfilter: add glue code to integrate nfnetlink_queue and ctnetlinkPablo Neira Ayuso
This patch allows you to include the conntrack information together with the packet that is sent to user-space via NFQUEUE. Previously, there was no integration between ctnetlink and nfnetlink_queue. If you wanted to access conntrack information from your libnetfilter_queue program, you required to query ctnetlink from user-space to obtain it. Thus, delaying the packet processing even more. Including the conntrack information is optional, you can set it via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07netfilter: xt_recent: add address masking optionDenys Fedoryshchenko
The mask option allows you put all address belonging that mask into the same recent slot. This can be useful in case that recent is used to detect attacks from the same network segment. Tested for backward compatibility. Signed-off-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-20net: Delete all remaining instances of ctl_pathEric W. Biederman
We don't use struct ctl_path anymore so delete the exported constants. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-24static keys: Introduce 'struct static_key', static_key_true()/false() and ↵Ingo Molnar
static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: a.p.zijlstra@chello.nl Cc: mathieu.desnoyers@efficios.com Cc: davem@davemloft.net Cc: ddaney.cavm@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-21netfilter: use jump_label for nf_hooksEric Dumazet
On configs where CONFIG_JUMP_LABEL=y, we can replace in fast path a load/compare/conditional jump by a single jump with no dcache reference. Jump target is modified as soon as nf_hooks[pf][hook] switches from empty state to non empty states. jump_label state is kept outside of nf_hooks array so has no cost on cpu caches. This patch removes the test on CONFIG_NETFILTER_DEBUG : No need to call nf_hook_slow() at all if nf_hooks[pf][hook] is empty, this didnt give useful information, but slowed down things a lot. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-27net: Add linux/sysctl.h includes where needed.David S. Miller
Several networking headers were depending upon the implicit linux/sysctl.h include they get when including linux/net.h Add explicit includes. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04netfilter: af_info: add 'strict' parameter to limit lookup to .oifFlorian Westphal
ipv6 fib lookup can set RT6_LOOKUP_F_IFACE flag to restrict search to an interface, but this flag cannot be set via struct flowi. Also, it cannot be set via ip6_route_output: this function uses the passed sock struct to determine if this flag is required (by testing for nonzero sk_bound_dev_if). Work around this by passing in an artificial struct sk in case 'strict' argument is true. This is required to replace the rt6_lookup call in xt_addrtype.c with nf_afinfo->route(). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04netfilter: af_info: add network namespace parameter to route hookFlorian Westphal
This is required to eventually replace the rt6_lookup call in xt_addrtype.c with nf_afinfo->route(). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18netfilter: allow NFQUEUE bypass if no listener is availableFlorian Westphal
If an skb is to be NF_QUEUE'd, but no program has opened the queue, the packet is dropped. This adds a v2 target revision of xt_NFQUEUE that allows packets to continue through the ruleset instead. Because the actual queueing happens outside of the target context, the 'bypass' flag has to be communicated back to the netfilter core. Unfortunately the only choice to do this without adding a new function argument is to use the target function return value (i.e. the verdict). In the NF_QUEUE case, the upper 16bit already contain the queue number to use. The previous patch reduced NF_VERDICT_MASK to 0xff, i.e. we now have extra room for a new flag. If a hook issued a NF_QUEUE verdict, then the netfilter core will continue packet processing if the queueing hook returns -ESRCH (== "this queue does not exist") and the new NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value. Note: If the queue exists, but userspace does not consume packets fast enough, the skb will still be dropped. Signed-off-by: Florian Westphal <fwestphal@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18netfilter: reduce NF_VERDICT_MASK to 0xffFlorian Westphal
NF_VERDICT_MASK is currently 0xffff. This is because the upper 16 bits are used to store errno (for NF_DROP) or the queue number (NF_QUEUE verdict). As there are up to 0xffff different queues available, there is no more room to store additional flags. At the moment there are only 6 different verdicts, i.e. we can reduce NF_VERDICT_MASK to 0xff to allow storing additional flags in the 0xff00 space. NF_VERDICT_BITS would then be reduced to 8, but because the value is exported to userspace, this might cause breakage; e.g.: e.g. 'queuenr = (1 << NF_VERDICT_BITS) | NF_QUEUE' would now break. Thus, remove NF_VERDICT_BITS usage in the kernel and move the old value to the 'userspace compat' section. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-13Merge branch 'master' of ↵Simon Horman
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into HEAD
2010-11-17netfilter: allow hooks to pass error code back up the stackEric Paris
SELinux would like to pass certain fatal errors back up the stack. This patch implements the generic netfilter support for this functionality. Based-on-patch-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15netfilter: add __rcu annotationsEric Dumazet
Add some __rcu annotations and use helpers to reduce number of sparse warnings (CONFIG_SPARSE_RCU_POINTER=y) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-12netfilter: NF_HOOK_COND has wrong conditionalEric Paris
The NF_HOOK_COND returns 0 when it shouldn't due to what I believe to be an error in the code as the order of operations is not what was intended. C will evalutate == before =. Which means ret is getting set to the bool result, rather than the return value of the function call. The code says if (ret = function() == 1) when it meant to say: if ((ret = function()) == 1) Normally the compiler would warn, but it doesn't notice it because its a actually complex conditional and so the wrong code is wrapped in an explict set of () [exactly what the compiler wants you to do if this was intentional]. Fixing this means that errors when netfilter denies a packet get propagated back up the stack rather than lost. Problem introduced by commit 2249065f (netfilter: get rid of the grossness in netfilter.h). Signed-off-by: Eric Paris <eparis@redhat.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-19netfilter: restore POST_ROUTING hook in NF_HOOK_CONDPatrick McHardy
Commit 2249065 ("netfilter: get rid of the grossness in netfilter.h") inverted the logic for conditional hook invocation, breaking the POST_ROUTING hook invoked by ip_output(). Correct the logic and remove an unnecessary initialization. Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-15netfilter: get rid of the grossness in netfilter.hJan Engelhardt
GCC is now smart enough to follow the inline trail correctly. vmlinux size remain the same. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-02-15netfilter: reduce NF_HOOK by one argumentJan Engelhardt
No changes in vmlinux filesize. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-02-02netfilter: xtables: CONFIG_COMPAT reduxAlexey Dobriyan
Ifdef out struct nf_sockopt_ops::compat_set struct nf_sockopt_ops::compat_get struct xt_match::compat_from_user struct xt_match::compat_to_user struct xt_match::compatsize to make structures smaller on COMPAT=n kernels. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-04net: cleanup include/linuxEric Dumazet
This cleanup patch puts struct/union/enum opening braces, in first line to ease grep games. struct something { becomes : struct something { Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30net: Make setsockopt() optlen be unsigned.David S. Miller
This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08netfilter: netns: remove nf_*_net() wrappersAlexey Dobriyan
Now that dev_net() exists, the usefullness of them is even less. Also they're a big problem in resolving circular header dependencies necessary for NOTRACK-in-netns patch. See below. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: Introduce NFPROTO_* constantsJan Engelhardt
The netfilter subsystem only supports a handful of protocols (much less than PF_*) and even non-PF protocols like ARP and pseudo-protocols like PF_BRIDGE. By creating NFPROTO_*, we can earn a few memory savings on arrays that previously were always PF_MAX-sized and keep the pseudo-protocols to ourselves. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: Use unsigned types for hooknum and pf varsJan Engelhardt
and (try to) consistently use u_int8_t for the L3 family. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-05-21netfilter: Move linux/types.h inclusions outside of #ifdef __KERNEL__Patrick McHardy
Greg Steuck <greg@nest.cx> points out that some of the netfilter headers can't be used in userspace without including linux/types.h first. The headers include their own linux/types.h include statements, these are stripped by make headers-install because they are inside #ifdef __KERNEL__ however. Move them out to fix this. Reported and Tested by Greg Steuck. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14[NETFILTER]: Add partial checksum validation helperPatrick McHardy
Move the UDP-Lite conntrack checksum validation to a generic helper similar to nf_checksum() and make it fall back to nf_checksum() in case the full packet is to be checksummed and hardware checksums are available. This is to be used by DCCP conntrack, which also needs to verify partial checksums. Signed-off-by: Patrick McHardy <kaber@trash.net>