summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
2016-09-15netfilter: x_tables: check for size overflowFlorian Westphal
[ Upstream commit d157bd761585605b7882935ffb86286919f62ea1 ] Ben Hawkes says: integer overflow in xt_alloc_table_info, which on 32-bit systems can lead to small structure allocation and a copy_from_user based heap corruption. Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15netfilter: x_tables: check for size overflowFlorian Westphal
[ Upstream commit d157bd761585605b7882935ffb86286919f62ea1 ] Ben Hawkes says: integer overflow in xt_alloc_table_info, which on 32-bit systems can lead to small structure allocation and a copy_from_user based heap corruption. Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: introduce and use xt_copy_counters_from_userFlorian Westphal
commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream. The three variants use same copy&pasted code, condense this into a helper and use that. Make sure info.name is 0-terminated. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: do compat validation via translate_tableFlorian Westphal
commit 09d9686047dbbe1cf4faa558d3ecc4aae2046054 upstream. This looks like refactoring, but its also a bug fix. Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few sanity tests that are done in the normal path. For example, we do not check for underflows and the base chain policies. While its possible to also add such checks to the compat path, its more copy&pastry, for instance we cannot reuse check_underflow() helper as e->target_offset differs in the compat case. Other problem is that it makes auditing for validation errors harder; two places need to be checked and kept in sync. At a high level 32 bit compat works like this: 1- initial pass over blob: validate match/entry offsets, bounds checking lookup all matches and targets do bookkeeping wrt. size delta of 32/64bit structures assign match/target.u.kernel pointer (points at kernel implementation, needed to access ->compatsize etc.) 2- allocate memory according to the total bookkeeping size to contain the translated ruleset 3- second pass over original blob: for each entry, copy the 32bit representation to the newly allocated memory. This also does any special match translations (e.g. adjust 32bit to 64bit longs, etc). 4- check if ruleset is free of loops (chase all jumps) 5-first pass over translated blob: call the checkentry function of all matches and targets. The alternative implemented by this patch is to drop steps 3&4 from the compat process, the translation is changed into an intermediate step rather than a full 1:1 translate_table replacement. In the 2nd pass (step #3), change the 64bit ruleset back to a kernel representation, i.e. put() the kernel pointer and restore ->u.user.name . This gets us a 64bit ruleset that is in the format generated by a 64bit iptables userspace -- we can then use translate_table() to get the 'native' sanity checks. This has two drawbacks: 1. we re-validate all the match and target entry structure sizes even though compat translation is supposed to never generate bogus offsets. 2. we put and then re-lookup each match and target. THe upside is that we get all sanity tests and ruleset validations provided by the normal path and can remove some duplicated compat code. iptables-restore time of autogenerated ruleset with 300k chains of form -A CHAIN0001 -m limit --limit 1/s -j CHAIN0002 -A CHAIN0002 -m limit --limit 1/s -j CHAIN0003 shows no noticeable differences in restore times: old: 0m30.796s new: 0m31.521s 64bit: 0m25.674s Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: xt_compat_match_from_user doesn't need a retvalFlorian Westphal
commit 0188346f21e6546498c2a0f84888797ad4063fc5 upstream. Always returned 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: don't reject valid target size on some architecturesFlorian Westphal
commit 7b7eba0f3515fca3296b8881d583f7c1042f5226 upstream. Quoting John Stultz: In updating a 32bit arm device from 4.6 to Linus' current HEAD, I noticed I was having some trouble with networking, and realized that /proc/net/ip_tables_names was suddenly empty. Digging through the registration process, it seems we're catching on the: if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 && target_offset + sizeof(struct xt_standard_target) != next_offset) return -EINVAL; Where next_offset seems to be 4 bytes larger then the offset + standard_target struct size. next_offset needs to be aligned via XT_ALIGN (so we can access all members of ip(6)t_entry struct). This problem didn't show up on i686 as it only needs 4-byte alignment for u64, but iptables userspace on other 32bit arches does insert extra padding. Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Fixes: 7ed2abddd20cf ("netfilter: x_tables: check standard target size too") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: validate all offsets and sizes in a ruleFlorian Westphal
commit 13631bfc604161a9d69cd68991dff8603edd66f9 upstream. Validate that all matches (if any) add up to the beginning of the target and that each match covers at least the base structure size. The compat path should be able to safely re-use the function as the structures only differ in alignment; added a BUILD_BUG_ON just in case we have an arch that adds padding as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: check for bogus target offsetFlorian Westphal
commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c upstream. We're currently asserting that targetoff + targetsize <= nextoff. Extend it to also check that targetoff is >= sizeof(xt_entry). Since this is generic code, add an argument pointing to the start of the match/target, we can then derive the base structure size from the delta. We also need the e->elems pointer in a followup change to validate matches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: check standard target size tooFlorian Westphal
commit 7ed2abddd20cf8f6bd27f65bd218f26fa5bf7f44 upstream. We have targets and standard targets -- the latter carries a verdict. The ip/ip6tables validation functions will access t->verdict for the standard targets to fetch the jump offset or verdict for chainloop detection, but this happens before the targets get checked/validated. Thus we also need to check for verdict presence here, else t->verdict can point right after a blob. Spotted with UBSAN while testing malformed blobs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: add compat version of xt_check_entry_offsetsFlorian Westphal
commit fc1221b3a163d1386d1052184202d5dc50d302d1 upstream. 32bit rulesets have different layout and alignment requirements, so once more integrity checks get added to xt_check_entry_offsets it will reject well-formed 32bit rulesets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: assert minimum target sizeFlorian Westphal
commit a08e4e190b866579896c09af59b3bdca821da2cd upstream. The target size includes the size of the xt_entry_target struct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: add and use xt_check_entry_offsetsFlorian Westphal
commit 7d35812c3214afa5b37a675113555259cfd67b98 upstream. Currently arp/ip and ip6tables each implement a short helper to check that the target offset is large enough to hold one xt_entry_target struct and that t->u.target_size fits within the current rule. Unfortunately these checks are not sufficient. To avoid adding new tests to all of ip/ip6/arptables move the current checks into a helper, then extend this helper in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18nf_conntrack: avoid kernel pointer value leak in slab nameLinus Torvalds
commit 31b0b385f69d8d5491a4bca288e25e63f1d945d0 upstream. The slab name ends up being visible in the directory structure under /sys, and even if you don't have access rights to the file you can see the filenames. Just use a 64-bit counter instead of the pointer to the 'net' structure to generate a unique name. This code will go away in 4.7 when the conntrack code moves to a single kmemcache, but this is the backportable simple solution to avoiding leaking kernel pointers to user space. Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11ipvs: drop first packet to redirect conntrackJulian Anastasov
commit f719e3754ee2f7275437e61a6afd520181fdd43b upstream. Jiri Bohac is reporting for a problem where the attempt to reschedule existing connection to another real server needs proper redirect for the conntrack used by the IPVS connection. For example, when IPVS connection is created to NAT-ed real server we alter the reply direction of conntrack. If we later decide to select different real server we can not alter again the conntrack. And if we expire the old connection, the new connection is left without conntrack. So, the only way to redirect both the IPVS connection and the Netfilter's conntrack is to drop the SYN packet that hits existing connection, to wait for the next jiffie to expire the old connection and its conntrack and to rely on client's retransmission to create new connection as usually. Jiri Bohac provided a fix that drops all SYNs on rescheduling, I extended his patch to do such drops only for connections that use conntrack. Here is the original report from Jiri Bohac: Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is dead"), new connections to dead servers are redistributed immediately to new servers. The old connection is expired using ip_vs_conn_expire_now() which sets the connection timer to expire immediately. However, before the timer callback, ip_vs_conn_expire(), is run to clean the connection's conntrack entry, the new redistributed connection may already be established and its conntrack removed instead. Fix this by dropping the first packet of the new connection instead, like we do when the destination server is not available. The timer will have deleted the old conntrack entry long before the first packet of the new connection is retransmitted. Fixes: dc7b3eb900aa ("ipvs: Fix reuse connection if real server is dead") Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11ipvs: correct initial offset of Call-ID header search in SIP persistence engineMarco Angaroni
commit 7617a24f83b5d67f4dab1844956be1cebc44aec8 upstream. The IPVS SIP persistence engine is not able to parse the SIP header "Call-ID" when such header is inserted in the first positions of the SIP message. When IPVS is configured with "--pe sip" option, like for example: ipvsadm -A -u 1.2.3.4:5060 -s rr --pe sip -p 120 -o some particular messages (see below for details) do not create entries in the connection template table, which can be listed with: ipvsadm -Lcn --persistent-conn Problematic SIP messages are SIP responses having "Call-ID" header positioned just after message first line: SIP/2.0 200 OK [Call-ID header here] [rest of the headers] When "Call-ID" header is positioned down (after a few other headers) it is correctly recognized. This is due to the data offset used in get_callid function call inside ip_vs_pe_sip.c file: since dptr already points to the start of the SIP message, the value of dataoff should be initially 0. Otherwise the header is searched starting from some bytes after the first character of the SIP message. Fixes: 758ff0338722 ("IPVS: sip persistence engine") Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11ipvs: handle ip_vs_fill_iph_skb_off failureArnd Bergmann
commit 3f20efba41916ee17ce82f0fdd02581ada2872b2 upstream. ip_vs_fill_iph_skb_off() may not find an IP header, and gcc has determined that ip_vs_sip_fill_param() then incorrectly accesses the protocol fields: net/netfilter/ipvs/ip_vs_pe_sip.c: In function 'ip_vs_sip_fill_param': net/netfilter/ipvs/ip_vs_pe_sip.c:76:5: error: 'iph.protocol' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (iph.protocol != IPPROTO_UDP) ^ net/netfilter/ipvs/ip_vs_pe_sip.c:81:10: error: 'iph.len' may be used uninitialized in this function [-Werror=maybe-uninitialized] dataoff = iph.len + sizeof(struct udphdr); ^ This adds a check for the ip_vs_fill_iph_skb_off() return code before looking at the ip header data returned from it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: b0e010c527de ("ipvs: replace ip_vs_fill_ip4hdr with ip_vs_fill_iph_skb_off") Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-18netfilter: nft_ct: include direction when dumping NFT_CT_L3PROTOCOL keyFlorian Westphal
one nft userspace test case fails with 'ct l3proto original ipv4' mismatches 'ct l3proto ipv4' ... because NFTA_CT_DIRECTION attr is missing. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-18netfilter: nf_tables: use skb->protocol instead of assuming ethernet headerPablo Neira Ayuso
Otherwise we may end up with incorrect network and transport header for other protocols. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-13netfilter: nf_tables: use reverse traversal commit_list in nf_tables_abortXin Long
When we use 'nft -f' to submit rules, it will build multiple rules into one netlink skb to send to kernel, kernel will process them one by one. meanwhile, it add the trans into commit_list to record every commit. if one of them's return value is -EAGAIN, status |= NFNL_BATCH_REPLAY will be marked. after all the process is done. it will roll back all the commits. now kernel use list_add_tail to add trans to commit, and use list_for_each_entry_safe to roll back. which means the order of adding and rollback is the same. that will cause some cases cannot work well, even trigger call trace, like: 1. add a set into table foo [return -EAGAIN]: commit_list = 'add set trans' 2. del foo: commit_list = 'add set trans' -> 'del set trans' -> 'del tab trans' then nf_tables_abort will be called to roll back: firstly process 'add set trans': case NFT_MSG_NEWSET: trans->ctx.table->use--; list_del_rcu(&nft_trans_set(trans)->list); it will del the set from the table foo, but it has removed when del table foo [step 2], then the kernel will panic. the right order of rollback should be: 'del tab trans' -> 'del set trans' -> 'add set trans'. which is opposite with commit_list order. so fix it by rolling back commits with reverse order in nf_tables_abort. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-10netfilter: nfnetlink: fix splat due to incorrect socket memory accounting in ↵Pablo Neira Ayuso
skbuff clones If we attach the sk to the skb from nfnetlink_rcv_batch(), then netlink_skb_destructor() will underflow the socket receive memory counter and we get warning splat when releasing the socket. $ cat /proc/net/netlink sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode ffff8800ca903000 12 0 00000000 -54144 0 0 2 0 17942 ^^^^^^ Rmem above shows an underflow. And here below the warning splat: [ 1363.815976] WARNING: CPU: 2 PID: 1356 at net/netlink/af_netlink.c:958 netlink_sock_destruct+0x80/0xb9() [...] [ 1363.816152] CPU: 2 PID: 1356 Comm: kworker/u16:1 Tainted: G W 4.4.0-rc1+ #153 [ 1363.816155] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012 [ 1363.816160] Workqueue: netns cleanup_net [ 1363.816163] 0000000000000000 ffff880119203dd0 ffffffff81240204 0000000000000000 [ 1363.816169] ffff880119203e08 ffffffff8104db4b ffffffff813d49a1 ffff8800ca771000 [ 1363.816174] ffffffff81a42b00 0000000000000000 ffff8800c0afe1e0 ffff880119203e18 [ 1363.816179] Call Trace: [ 1363.816181] <IRQ> [<ffffffff81240204>] dump_stack+0x4e/0x79 [ 1363.816193] [<ffffffff8104db4b>] warn_slowpath_common+0x9a/0xb3 [ 1363.816197] [<ffffffff813d49a1>] ? netlink_sock_destruct+0x80/0xb9 skb->sk was only needed to lookup for the netns, however we don't need this anymore since 633c9a840d0b ("netfilter: nfnetlink: avoid recurrent netns lookups in call_batch") so this patch removes this manual socket assignment to resolve this problem. Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
2015-12-10netfilter: nfnetlink: avoid recurrent netns lookups in call_batchPablo Neira Ayuso
Pass the net pointer to the call_batch callback functions so we can skip recurrent lookups. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
2015-12-09netfilter: nfnetlink_queue: Unregister pernet subsys in case of init failureNikolay Borisov
Commit 3bfe049807c2403 ("netfilter: nfnetlink_{log,queue}: Register pernet in first place") reorganised the initialisation order of the pernet_subsys to avoid "use-before-initialised" condition. However, in doing so the cleanup logic in nfnetlink_queue got botched in that the pernet_subsys wasn't cleaned in case nfnetlink_subsys_register failed. This patch adds the necessary cleanup routine call. Fixes: 3bfe049807c2403 ("netfilter: nfnetlink_{log,queue}: Register pernet in first place") Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23netfilter: nfnetlink_queue: avoid harmless unnitialized variable warningsArnd Bergmann
Several ARM default configurations give us warnings on recent compilers about potentially uninitialized variables in the nfnetlink code in two functions: net/netfilter/nfnetlink_queue.c: In function 'nfqnl_build_packet_message': net/netfilter/nfnetlink_queue.c:519:19: warning: 'nfnl_ct' may be used uninitialized in this function [-Wmaybe-uninitialized] if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0) Moving the rcu_dereference(nfnl_ct_hook) call outside of the conditional code avoids the warning without forcing us to preinitialize the variable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: a4b4766c3ceb ("netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-15ipvs: use skb_to_full_sk() helperEric Dumazet
SYNACK packets might be attached to request sockets. Use skb_to_full_sk() helper to avoid illegal accesses to inet_sk(skb->sk) Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree. This large batch that includes fixes for ipset, netfilter ingress, nf_tables dynamic set instantiation and a longstanding Kconfig dependency problem. More specifically, they are: 1) Add missing check for empty hook list at the ingress hook, from Florian Westphal. 2) Input and output interface are swapped at the ingress hook, reported by Patrick McHardy. 3) Resolve ipset extension alignment issues on ARM, patch from Jozsef Kadlecsik. 4) Fix bit check on bitmap in ipset hash type, also from Jozsef. 5) Release buckets when all entries have expired in ipset hash type, again from Jozsef. 6) Oneliner to initialize conntrack tuple object in the PPTP helper, otherwise the conntrack lookup may fail due to random bits in the structure holes, patch from Anthony Lineham. 7) Silence a bogus gcc warning in nfnetlink_log, from Arnd Bergmann. 8) Fix Kconfig dependency problems with TPROXY, socket and dup, also from Arnd. 9) Add __netdev_alloc_pcpu_stats() to allow creating percpu counters from atomic context, this is required by the follow up fix for nf_tables. 10) Fix crash from the dynamic set expression, we have to add new clone operation that should be defined when a simple memcpy is not enough. This resolves a crash when using per-cpu counters with new Patrick McHardy's flow table nft support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-10netfilter: nf_tables: add clone interface to expression operationsPablo Neira Ayuso
With the conversion of the counter expressions to make it percpu, we need to clone the percpu memory area, otherwise we crash when using counters from flow tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-10netfilter: fix xt_TEE and xt_TPROXY dependenciesArnd Bergmann
Kconfig is too smart for its own good: a Kconfig line that states select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES means that if IP6_NF_IPTABLES is set to 'm', then NF_DEFRAG_IPV6 will also be set to 'm', regardless of the state of the symbol from which it is selected. When the xt_TEE driver is built-in and nothing else forces NF_DEFRAG_IPV6 to be built-in, this causes a link-time error: net/built-in.o: In function `tee_tg6': net/netfilter/xt_TEE.c:46: undefined reference to `nf_dup_ipv6' This works around that behavior by changing the dependency to 'if IP6_NF_IPTABLES != n', which is interpreted as boolean expression rather than a tristate and causes the NF_DEFRAG_IPV6 symbol to be built-in as well. The bug only occurs once in thousands of 'randconfig' builds and does not really impact real users. From inspecting the other surrounding Kconfig symbols, I am guessing that NETFILTER_XT_TARGET_TPROXY and NETFILTER_XT_MATCH_SOCKET have the same issue. If not, this change should still be harmless. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-10netfilter: nfnetlink_log: work around uninitialized variable warningArnd Bergmann
After a recent (correct) change, gcc started warning about the use of the 'flags' variable in nfulnl_recv_config() net/netfilter/nfnetlink_log.c: In function 'nfulnl_recv_config': net/netfilter/nfnetlink_log.c:320:14: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/nfnetlink_log.c:828:6: note: 'flags' was declared here The warning first shows up in ARM s3c2410_defconfig with gcc-4.3 or higher (including 5.2.1, which is the latest version I checked) I tried working around it by rearranging the code but had no success with that. As a last resort, this initializes the variable to zero, which shuts up the warning, but means that we don't get a warning if the code is ever changed in a way that actually causes the variable to be used without first being written. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 8cbc870829ec ("netfilter: nfnetlink_log: validate dependencies to avoid breaking atomicity") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-08netfilter: nft_meta: use skb_to_full_sk() helperEric Dumazet
SYNACK packets might be attached to request sockets. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-08netfilter: xt_owner: use skb_to_full_sk() helperEric Dumazet
SYNACK packets might be attached to a request socket, xt_owner wants to gte the listener in this case. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-07netfilter: ipset: Fix hash type expire: release empty hash bucket blockJozsef Kadlecsik
When all entries are expired/all slots are empty, release the bucket. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-11-07netfilter: ipset: Fix hash:* type expirationJozsef Kadlecsik
Incorrect index was used when the data blob was shrinked at expiration, which could lead to falsely expired entries and memory leak when the comment extension was used too. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-11-07netfilter: ipset: Fix extension alignmentJozsef Kadlecsik
The data extensions in ipset lacked the proper memory alignment and thus could lead to kernel crash on several architectures. Therefore the structures have been reorganized and alignment attributes added where needed. The patch was tested on armv7h by Gerhard Wiesinger and on x86_64, sparc64 by Jozsef Kadlecsik. Reported-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Conflicts: net/netfilter/xt_TEE.c Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix crash when TEE target is used with no --oif, from Eric Dumazet. 2) Oneliner to fix a crash on the redirect traffic to localhost infrastructure when interface has not yet an address, from Munehisa Kamata. 3) Oneliner not to request module all the time from nfnetlink due to wrong type value, from Florian Westphal. I'll make sure these patches 1 and 2 hit -stable. ==================== The conflict in net/netfilter/xt_TEE.c was minor, a change to the 'oif' selection overlapping a function signature change for the nf_dup_ipv{4,6}() routines. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-28netfilter: nfnetlink: don't probe module if it existsFlorian Westphal
nfnetlink_bind request_module()s all the time as nfnetlink_get_subsys() shifts the argument by 8 to obtain the subsys id. So using type instead of type << 8 always returns NULL. Fixes: 03292745b02d11 ("netlink: add nlk->netlink_bind hook for module auto-loading") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-27netfilter: nf_nat_redirect: add missing NULL pointer checkMunehisa Kamata
Commit 8b13eddfdf04cbfa561725cfc42d6868fe896f56 ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables") has introduced a trivial logic change which can result in the following crash. BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: [<ffffffffa033002d>] nf_nat_redirect_ipv4+0x2d/0xa0 [nf_nat_redirect] PGD 3ba662067 PUD 3ba661067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: ipv6(E) xt_REDIRECT(E) nf_nat_redirect(E) xt_tcpudp(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) ip_tables(E) x_tables(E) binfmt_misc(E) xfs(E) libcrc32c(E) evbug(E) evdev(E) psmouse(E) i2c_piix4(E) i2c_core(E) acpi_cpufreq(E) button(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) CPU: 0 PID: 2536 Comm: ip Tainted: G E 4.1.7-15.23.amzn1.x86_64 #1 Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/06/2015 task: ffff8800eb438000 ti: ffff8803ba664000 task.ti: ffff8803ba664000 [...] Call Trace: <IRQ> [<ffffffffa0334065>] redirect_tg4+0x15/0x20 [xt_REDIRECT] [<ffffffffa02e2e99>] ipt_do_table+0x2b9/0x5e1 [ip_tables] [<ffffffffa0328045>] iptable_nat_do_chain+0x25/0x30 [iptable_nat] [<ffffffffa031777d>] nf_nat_ipv4_fn+0x13d/0x1f0 [nf_nat_ipv4] [<ffffffffa0328020>] ? iptable_nat_ipv4_fn+0x20/0x20 [iptable_nat] [<ffffffffa031785e>] nf_nat_ipv4_in+0x2e/0x90 [nf_nat_ipv4] [<ffffffffa03280a5>] iptable_nat_ipv4_in+0x15/0x20 [iptable_nat] [<ffffffff81449137>] nf_iterate+0x57/0x80 [<ffffffff814491f7>] nf_hook_slow+0x97/0x100 [<ffffffff814504d4>] ip_rcv+0x314/0x400 unsigned int nf_nat_redirect_ipv4(struct sk_buff *skb, ... { ... rcu_read_lock(); indev = __in_dev_get_rcu(skb->dev); if (indev != NULL) { ifa = indev->ifa_list; newdst = ifa->ifa_local; <--- } rcu_read_unlock(); ... } Before the commit, 'ifa' had been always checked before access. After the commit, however, it could be accessed even if it's NULL. Interestingly, this was once fixed in 2003. http://marc.info/?l=netfilter-devel&m=106668497403047&w=2 In addition to the original one, we have seen the crash when packets that need to be redirected somehow arrive on an interface which hasn't been yet fully configured. This change just reverts the logic to the old behavior to avoid the crash. Fixes: 8b13eddfdf04 ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables") Signed-off-by: Munehisa Kamata <kamatam@amazon.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22netfilter: xt_TEE: fix NULL dereferenceEric Dumazet
iptables -I INPUT ... -j TEE --gateway 10.1.2.3 <crash> because --oif was not specified tee_tg_check() sets ->priv pointer to NULL in this case. Fixes: bbde9fc1824a ("netfilter: factor out packet duplication for IPv4/IPv6") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-17Merge branch 'master' of ↵Pablo Neira Ayuso
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next This merge resolves conflicts with 75aec9df3a78 ("bridge: Remove br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve netns support in the network stack that reached upstream via David's net-next tree. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/bridge/br_netfilter_hooks.c
2015-10-17netfilter: ipset: Fix sleeping memory allocation in atomic contextNikolay Borisov
Commit 00590fdd5be0 introduced RCU locking in list type and in doing so introduced a memory allocation in list_set_add, which is done in an atomic context, due to the fact that ipset rcu list modifications are serialised with a spin lock. The reason why we can't use a mutex is that in addition to modifying the list with ipset commands, it's also being modified when a particular ipset rule timeout expires aka garbage collection. This gc is triggered from set_cleanup_entries, which in turn is invoked from a timer thus requiring the lock to be bh-safe. Concretely the following call chain can lead to "sleeping function called in atomic context" splat: call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL). And since GFP_KERNEL allows initiating direct reclaim thus potentially sleeping in the allocation path. To fix the issue change the allocation type to GFP_ATOMIC, to correctly reflect that it is occuring in an atomic context. Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type") Signed-off-by: Nikolay Borisov <kernel@kyup.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16netfilter: nf_queue: remove rcu_read_lock callsFlorian Westphal
All verdict handlers make use of the nfnetlink .call_rcu callback so rcu readlock is already held. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16netfilter: make nf_queue_entry_get_refs return voidFlorian Westphal
We don't care if module is being unloaded anymore since hook unregister handling will destroy queue entries using that hook. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16netfilter: remove hook owner refcountingFlorian Westphal
since commit 8405a8fff3f8 ("netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook") all pending queued entries are discarded. So we can simply remove all of the owner handling -- when module is removed it also needs to unregister all its hooks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-15netfilter: nfnetlink_log: validate dependencies to avoid breaking atomicityPablo Neira
Check that dependencies are fulfilled before updating the logger instance, otherwise we can leave things in intermediate state on errors in nfulnl_recv_config(). [ Ken-ichirou reports that this is also fixing missing instance refcnt drop on error introduced in his patch 914eebf2f434 ("netfilter: nfnetlink_log: autoload nf_conntrack_netlink module NFQA_CFG_F_CONNTRACK config flag"). ] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
2015-10-15netfilter: nfnetlink_log: consolidate check for instance in nfulnl_recv_config()Pablo Neira Ayuso
This patch consolidates the check for valid logger instance once we have passed the command handling: The config message that we receive may contain the following info: 1) Command only: We always get a valid instance pointer if we just created it. In case that the instance is being destroyed or the command is unknown, we jump to exit path of nfulnl_recv_config(). This patch doesn't modify this handling. 2) Config only: In this case, the instance must always exist since the user is asking for configuration updates. If the instance doesn't exist this returns -ENODEV. 3) No command and no configs are specified: This case is rare. The user is sending us a config message with neither commands nor config options. In this case, we have to check if the instance exists and bail out otherwise. Before this patch, it was possible to send a config message with no command and no config updates for an unexisting instance without triggering an error. So this is the only case that changes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
2015-10-13netfilter: sync with packet rx also after removing queue entriesFlorian Westphal
We need to sync packet rx again after flushing the queue entries. Otherwise, the following race could happen: cpu1: nf_unregister_hook(H) called, H unliked from lists, calls synchronize_net() to wait for packet rx completion. Problem is that while no new nf_queue_entry structs that use H can be allocated, another CPU might receive a verdict from userspace just before cpu1 calls nf_queue_nf_hook_drop to remove this entry: cpu2: receive verdict from userspace, lock queue cpu2: unlink nf_queue_entry struct E, which references H, from queue list cpu1: calls nf_queue_nf_hook_drop, blocks on queue spinlock cpu2: unlock queue cpu1: nf_queue_nf_hook_drop drops affected queue entries cpu2: call nf_reinject for E cpu1: kfree(H) cpu2: potential use-after-free for H Cc: Eric W. Biederman <ebiederm@xmission.com> Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-13netfilter: nfqueue: don't use prev pointerFlorian Westphal
Usage of -prev seems buggy. While packet was out our hook cannot be removed but we have no way to know if the previous one is still valid. So better not use ->prev at all. Since NF_REPEAT just asks to invoke same hook function again, just do so, and continue with nf_interate if we get an ACCEPT verdict. A side effect of this change is that if nf_reinject(NF_REPEAT) causes another REPEAT we will now drop the skb instead of a kernel loop. However, NF_REPEAT loops would be a bug so this should not happen anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-12ipv4: Pass struct net into ip_defrag and ip_check_defragEric W. Biederman
The function ip_defrag is called on both the input and the output paths of the networking stack. In particular conntrack when it is tracking outbound packets from the local machine calls ip_defrag. So add a struct net parameter and stop making ip_defrag guess which network namespace it needs to defragment packets in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12netfilter: nfnetlink_log: autoload nf_conntrack_netlink module ↵Ken-ichirou MATSUZAWA
NFQA_CFG_F_CONNTRACK config flag This patch enables to load nf_conntrack_netlink module if NFULNL_CFG_F_CONNTRACK config flag is specified. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-12Merge tag 'ipvs4-for-v4.4' of ↵Pablo Neira Ayuso
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Fourth Round of IPVS Updates for v4.4 please consider these build warning cleanups from David Ahern and myself. They resolve some minor side effects of Eric Biederman' heroic work to cleanup IPVS which you recently pulled: its queued up for v4.4 so no need to worry about earlier kernel versions. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>