summaryrefslogtreecommitdiff
path: root/net/ipv6/route.c
AgeCommit message (Collapse)Author
2015-07-03net-ipv6: Delete an unnecessary check before the function call "free_percpu"Markus Elfring
The free_percpu() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25ipv6: Create percpu rt6_infoMartin KaFai Lau
After the patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception', we need to compensate the performance hit (bouncing dst->__refcnt). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25ipv6: Break up ip6_rt_copy()Martin KaFai Lau
This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and ip6_rt_cache_alloc(). In the later patch, we need to create a percpu rt6_info copy. Hence, refactor the common rt6_info init codes to ip6_rt_copy_init(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregisterMartin KaFai Lau
This patch keeps track of the DST_NOCACHE routes in a list and replaces its dev with loopback during the iface down/unregister event. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is setMartin KaFai Lau
This patch always creates RTF_CACHE clone with DST_NOCACHE when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to the fl6->daddr. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Tested-by: Julian Anastasov <ja@ssi.bg> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25ipv6: Only create RTF_CACHE routes after encountering pmtu exceptionMartin KaFai Lau
This patch creates a RTF_CACHE routes only after encountering a pmtu exception. After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6 tree, the rt->rt6i_node->fn_sernum is bumped which will fail the ip6_dst_check() and trigger a relookup. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25ipv6: Combine rt6_alloc_cow and rt6_alloc_cloneMartin KaFai Lau
A prep work for creating RTF_CACHE on exception only. After this patch, the same condition (rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)) is checked twice. This redundancy will be removed in the later patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCASTMartin KaFai Lau
When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst. Also, rt6i_gateway is always set to the nexthop while the nexthop could be a gateway or the rt6i_dst.addr. After removing the rt6i_dst and rt6i_src dependency in the last patch, we also need to stop the caller from depending on rt6i_gateway and RTF_ANYCAST. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/cadence/macb.c drivers/net/phy/phy.c include/linux/skbuff.h net/ipv4/tcp.c net/switchdev/switchdev.c Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD} renaming overlapping with net-next changes of various sorts. phy.c was a case of two changes, one adding a local variable to a function whilst the second was removing one. tcp.c overlapped a deadlock fix with the addition of new tcp_info statistic values. macb.c involved the addition of two zyncq device entries. skbuff.h involved adding back ipv4_daddr to nf_bridge_info whilst net-next changes put two other existing members of that struct into a union. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21ipv6: reject locally assigned nexthop addressesFlorian Westphal
ip -6 addr add dead::1/128 dev eth0 sleep 5 ip -6 route add default via dead::1/128 -> fails ip -6 addr add dead::1/128 dev eth0 ip -6 route add default via dead::1/128 -> succeeds reason is that if (nonsensensical) route above is added, dead::1 is still subject to DAD, so the route lookup will pick eth0 as outdev due to the prefix route that is added before DAD work is started. Add explicit test that checks if nexthop gateway is a local address. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1167969 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20ipv6: fix ECMP route replacementMichal Kubeček
When replacing an IPv6 multipath route with "ip route replace", i.e. NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first matching route without fixing its siblings, resulting in corrupted siblings linked list; removing one of the siblings can then end in an infinite loop. IPv6 ECMP implementation is a bit different from IPv4 so that route replacement cannot work in exactly the same way. This should be a reasonable approximation: 1. If the new route is ECMP-able and there is a matching ECMP-able one already, replace it and all its siblings (if any). 2. If the new route is ECMP-able and no matching ECMP-able route exists, replace first matching non-ECMP-able (if any) or just add the new one. 3. If the new route is not ECMP-able, replace first matching non-ECMP-able route (if any) or add the new route. We also need to remove the NLM_F_REPLACE flag after replacing old route(s) by first nexthop of an ECMP route so that each subsequent nexthop does not replace previous one. Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20ipv6: do not delete previously existing ECMP routes if add failsMichal Kubeček
If adding a nexthop of an IPv6 multipath route fails, comment in ip6_route_multipath() says we are going to delete all nexthops already added. However, current implementation deletes even the routes it hasn't even tried to add yet. For example, running ip route add 1234:5678::/64 \ nexthop via fe80::aa dev dummy1 \ nexthop via fe80::bb dev dummy1 \ nexthop via fe80::cc dev dummy1 twice results in removing all routes first command added. Limit the second (delete) run to nexthops that succeeded in the first (add) run. Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Four minor merge conflicts: 1) qca_spi.c renamed the local variable used for the SPI device from spi_device to spi, meanwhile the spi_set_drvdata() call got moved further up in the probe function. 2) Two changes were both adding new members to codel params structure, and thus we had overlapping changes to the initializer function. 3) 'net' was making a fix to sk_release_kernel() which is completely removed in 'net-next'. 4) In net_namespace.c, the rtnl_net_fill() call for GET operations had the command value fixed, meanwhile 'net-next' adjusted the argument signature a bit. This also matches example merge resolutions posted by Stephen Rothwell over the past two days. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09ipv6: Fixed source specific default route handling.Markus Stenberg
If there are only IPv6 source specific default routes present, the host gets -ENETUNREACH on e.g. connect() because ip6_dst_lookup_tail calls ip6_route_output first, and given source address any, it fails, and ip6_route_get_saddr is never called. The change is to use the ip6_route_get_saddr, even if the initial ip6_route_output fails, and then doing ip6_route_output _again_ after we have appropriate source address available. Note that this is '99% fix' to the problem; a correct fix would be to do route lookups only within addrconf.c when picking a source address, and never call ip6_route_output before source address has been populated. Signed-off-by: Markus Stenberg <markus.stenberg@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03ipv6: Check RTF_LOCAL on rt->rt6i_flags instead of rt->dst.flagsMartin KaFai Lau
In my earlier commit: 653437d02f1f ("ipv6: Stop /128 route from disappearing after pmtu update"), there was a horrible typo. Instead of checking RTF_LOCAL on rt->rt6i_flags, it was checked on rt->dst.flags. This patch fixes it. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hajime Tazaki <tazaki@sfc.wide.ad.jp> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01ipv6: Remove DST_METRICS_FORCE_OVERWRITE and _rt6i_peerMartin KaFai Lau
_rt6i_peer is no longer needed after the last patch, 'ipv6: Stop rt6_info from using inet_peer's metrics'. DST_METRICS_FORCE_OVERWRITE is added by commit e5fd387ad5b3 ("ipv6: do not overwrite inetpeer metrics prematurely"). Since inetpeer is no longer used for metrics, this bit is also not needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01ipv6: Stop rt6_info from using inet_peer's metricsMartin KaFai Lau
inet_peer is indexed by the dst address alone. However, the fib6 tree could have multiple routing entries (rt6_info) for the same dst. For example, 1. A /128 dst via multiple gateways. 2. A RTF_CACHE route cloned from a /128 route. In the above cases, all of them will share the same metrics and step on each other. This patch will steer away from inet_peer's metrics and use dst_cow_metrics_generic() for everything. Change Highlights: 1. Remove rt6_cow_metrics() which currently acquires metrics from inet_peer for DST_HOST route (i.e. /128 route). 2. Add rt6i_pmtu to take care of the pmtu update to avoid creating a full size metrics just to override the RTAX_MTU. 3. After (2), the RTF_CACHE route can also share the metrics with its dst.from route, by: dst_init_metrics(&cache_rt->dst, dst_metrics_ptr(cache_rt->dst.from), true); 4. Stop creating RTF_CACHE route by cloning another RTF_CACHE route. Instead, directly clone from rt->dst. [ Currently, cloning from another RTF_CACHE is only possible during rt6_do_redirect(). Also, the old clone is removed from the tree immediately after the new clone is added. ] In case of cloning from an older redirect RTF_CACHE, it should work as before. In case of cloning from an older pmtu RTF_CACHE, this patch will forget the pmtu and re-learn it (if there is any) from the redirected route. The _rt6i_peer and DST_METRICS_FORCE_OVERWRITE will be removed in the next cleanup patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01ipv6: Stop /128 route from disappearing after pmtu updateMartin KaFai Lau
This patch is mostly from Steffen Klassert <steffen.klassert@secunet.com>. I only removed the (rt6->rt6i_dst.plen == 128) check from ip6_rt_update_pmtu() because the (rt6->rt6i_flags & RTF_CACHE) test has already implied it. This patch: 1. Create RTF_CACHE route for /128 non local route 2. After (1), all routes that allow pmtu update should have a RTF_CACHE clone. Hence, stop updating MTU for any non RTF_CACHE route. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01ipv6: Extend the route lookups to low priority metrics.Steffen Klassert
We search only for routes with highest priority metric in find_rr_leaf(). However if one of these routes is marked as invalid, we may fail to find a route even if there is a appropriate route with lower priority. Then we loose connectivity until the garbage collector deletes the invalid route. This typically happens if a host route expires afer a pmtu event. Fix this by searching also for routes with a lower priority metric. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01ipv6: Consider RTF_CACHE when searching the fib6 treeMartin KaFai Lau
It is a prep work for the later bug-fix patch which will stop /128 route from disappearing after pmtu update. The later bug-fix patch will allow a /128 route and its RTF_CACHE clone both exist at the same fib6_node. To do this, we need to prepare the existing fib6 tree search to expect RTF_CACHE for /128 route. Note that the fn->leaf is sorted by rt6i_metric. Hence, RTF_CACHE (if there is any) is always at the front. This property leads to the following: 1. When doing ip6_route_del(), it should honor the RTF_CACHE flag which the caller is used to ask for deleting clone or non-clone. The rtm_to_fib6_config() should also check the RTM_F_CLONED and then set RTF_CACHE accordingly so that: - 'ip -6 r del...' will make ip6_route_del() to delete a route and all its clones. Note that its clones is flushed by fib6_del() - 'ip -6 r flush table cache' will make ip6_route_del() to only delete clone(s). 2. Exclude RTF_CACHE from addrconf_get_prefix_route() which should not configure on a cloned route. 3. No change is need for rt6_device_match() since it currently could return a RTF_CACHE clone route, so the later bug-fix patch will not affect it. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31netlink: implement nla_get_in_addr and nla_get_in6_addrJiri Benc
Those are counterparts to nla_put_in_addr and nla_put_in6_addr. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31netlink: implement nla_put_in_addr and nla_put_in6_addrJiri Benc
IP addresses are often stored in netlink attributes. Add generic functions to do that. For nla_put_in_addr, it would be nicer to pass struct in_addr but this is not used universally throughout the kernel, in way too many places __be32 is used to store IPv4 address. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31ipv6: coding style: comparison for equality with NULLIan Morris
The ipv6 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-11ipv6: expose RFC4191 route preference via rtnetlinkLubomir Rintel
This makes it possible to retain the route preference when RAs are handled in userspace. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09net: Remove protocol from struct dst_opsEric W. Biederman
After my change to neigh_hh_init to obtain the protocol from the neigh_table there are no more users of protocol in struct dst_ops. Remove the protocol field from dst_ops and all of it's initializers. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-14ipv6: fix ipv6_cow_metrics for non DST_HOST caseMartin KaFai Lau
ipv6_cow_metrics() currently assumes only DST_HOST routes require dynamic metrics allocation from inetpeer. The assumption breaks when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric. Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set() is called after the route is created. This patch creates the metrics array (by calling dst_cow_metrics_generic) in ipv6_cow_metrics(). Test: radvd.conf: interface qemubr0 { AdvLinkMTU 1300; AdvCurHopLimit 30; prefix fd00:face:face:face::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr off; }; }; Before: [root@qemu1 ~]# ip -6 r show | egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec fe80::/64 dev eth0 proto kernel metric 256 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec After: [root@qemu1 ~]# ip -6 r show | egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300 fe80::/64 dev eth0 proto kernel metric 256 mtu 1300 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30 Fixes: 8e2ec639173f325 (ipv6: don't use inetpeer to store metrics for routes.) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08rt6_probe_deferred: Do not depend on struct orderingMichael Büsch
rt6_probe allocates a struct __rt6_probe_work and schedules a work handler rt6_probe_deferred. But rt6_probe_deferred kfree's the struct work_struct instead of struct __rt6_probe_work. This works, because struct work_struct is the first element of struct __rt6_probe_work. Change it to kfree struct __rt6_probe_work to not implicitly depend on struct work_struct being the first element. This does not affect the generated code. Signed-off-by: Michael Buesch <m@bues.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: arch/arm/boot/dts/imx6sx-sdb.dts net/sched/cls_bpf.c Two simple sets of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25ipv6: Fix __ip6_route_redirectMartin KaFai Lau
In my last commit (a3c00e4: ipv6: Remove BACKTRACK macro), the changes in __ip6_route_redirect is incorrect. The following case is missed: 1. The for loop tries to find a valid gateway rt. If it fails to find one, rt will be NULL. 2. When rt is NULL, it is set to the ip6_null_entry. 3. The newly added 'else if', from a3c00e4, will stop the backtrack from happening. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19ipv6: stop sending PTB packets for MTU < 1280Hagen Paul Pfeifer
Reduce the attack vector and stop generating IPv6 Fragment Header for paths with an MTU smaller than the minimum required IPv6 MTU size (1280 byte) - called atomic fragments. See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1] for more information and how this "feature" can be misused. [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00 Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18netlink: make nlmsg_end() and genlmsg_end() voidJohannes Berg
Contrary to common expectations for an "int" return, these functions return only a positive value -- if used correctly they cannot even return 0 because the message header will necessarily be in the skb. This makes the very common pattern of if (genlmsg_end(...) < 0) { ... } be a whole bunch of dead code. Many places also simply do return nlmsg_end(...); and the caller is expected to deal with it. This also commonly (at least for me) causes errors, because it is very common to write if (my_function(...)) /* error condition */ and if my_function() does "return nlmsg_end()" this is of course wrong. Additionally, there's not a single place in the kernel that actually needs the message length returned, and if anyone needs it later then it'll be very easy to just use skb->len there. Remove this, and make the functions void. This removes a bunch of dead code as described above. The patch adds lines because I did - return nlmsg_end(...); + nlmsg_end(...); + return 0; I could have preserved all the function's return values by returning skb->len, but instead I've audited all the places calling the affected functions and found that none cared. A few places actually compared the return value with <= 0 in dump functionality, but that could just be changed to < 0 with no change in behaviour, so I opted for the more efficient version. One instance of the error I've made numerous times now is also present in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't check for <0 or <=0 and thus broke out of the loop every single time. I've preserved this since it will (I think) have caused the messages to userspace to be formatted differently with just a single message for every SKB returned to userspace. It's possible that this isn't needed for the tools that actually use this, but I don't even know what they are so couldn't test that changing this behaviour would be acceptable. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05net: tcp: add RTAX_CC_ALGO fib handlingDaniel Borkmann
This patch adds the minimum necessary for the RTAX_CC_ALGO congestion control metric to be set up and dumped back to user space. While the internal representation of RTAX_CC_ALGO is handled as a u32 key, we avoided to expose this implementation detail to user space, thus instead, we chose the netlink attribute that is being exchanged between user space to be the actual congestion control algorithm name, similarly as in the setsockopt(2) API in order to allow for maximum flexibility, even for 3rd party modules. It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as it should have been stored in RTAX_FEATURES instead, we first thought about reusing it for the congestion control key, but it brings more complications and/or confusion than worth it. Joint work with Florian Westphal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05net: fib6: convert cfg metric to u32 outside of table write lockFlorian Westphal
Do the nla validation earlier, outside the write lock. This is needed by followup patch which needs to be able to call request_module (which can sleep) if needed. Joint work with Daniel Borkmann. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24ipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fnMartin KaFai Lau
This patch save the fn before doing rt6_backtrack. Hence, without redo-ing the fib6_lookup(), saved_fn can be used to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off. Some minor changes I think make sense to review as a single patch: * Remove the 'out:' goto label. * Remove the 'reachable' variable. Only use the 'strict' variable instead. After this patch, "failing ip6_ins_rt()" should be the only case that requires a redo of fib6_lookup(). Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24ipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit caseMartin KaFai Lau
When there is a RTF_CACHE hit, no need to redo fib6_lookup() with reachable=0. Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24ipv6: Remove BACKTRACK macroMartin KaFai Lau
It is the prep work to reduce the number of calls to fib6_lookup(). The BACKTRACK macro could be hard-to-read and error-prone due to its side effects (mainly goto). This patch is to: 1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following return values: * If it is backtrack-able, returns next fn for retry. * If it reaches the root, returns NULL. 2. The caller needs to decide if a backtrack is needed (by testing rt == net->ipv6.ip6_null_entry). 3. Rename the goto labels in ip6_pol_route() to make the next few patches easier to read. Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/usb/r8152.c net/netfilter/nfnetlink.c Both r8152 and nfnetlink conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30ipv6: remove rt6i_genidHannes Frederic Sowa
Eric Dumazet noticed that all no-nonexthop or no-gateway routes which are already marked DST_HOST (e.g. input routes routes) will always be invalidated during sk_dst_check. Thus per-socket dst caching absolutely had no effect and early demuxing had no effect. Thus this patch removes rt6i_genid: fn_sernum already gets modified during add operations, so we only must ensure we mutate fn_sernum during ipv6 address remove operations. This is a fairly cost extensive operations, but address removal should not happen that often. Also our mtu update functions do the same and we heard no complains so far. xfrm policy changes also cause a call into fib6_flush_trees. Also plug a hole in rt6_info (no cacheline changes). I verified via tracing that this change has effect. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24ipv6: White-space cleansing : gaps between function and symbol exportIan Morris
This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. This patch removes some blank lines between the end of a function definition and the EXPORT_SYMBOL_GPL macro in order to prevent checkpatch warning that EXPORT_SYMBOL must immediately follow a function. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24ipv6: White-space cleansing : Line LayoutsIan Morris
This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. A number of items are addressed in this patch: * Multiple spaces converted to tabs * Spaces before tabs removed. * Spaces in pointer typing cleansed (char *)foo etc. * Remove space after sizeof * Ensure spacing around comparators such as if statements. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21ipv6: slight optimization in ip6_dst_gcLi RongQing
entries is always greater than rt_max_size here, since if entries is less than rt_max_size, the fib6_run_gc function will be skipped Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16net: ipv6: make "ip -6 route get mark xyz" work.Lorenzo Colitti
Currently, "ip -6 route get mark xyz" ignores the mark passed in by userspace. Make it honour the mark, just like IPv4 does. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15ipv6: update Destination Cache entries when gateway turn into hostDuan Jiong
RFC 4861 states in 7.2.5: The IsRouter flag in the cache entry MUST be set based on the Router flag in the received advertisement. In those cases where the IsRouter flag changes from TRUE to FALSE as a result of this update, the node MUST remove that router from the Default Router List and update the Destination Cache entries for all destinations using that neighbor as a router as specified in Section 7.3.3. This is needed to detect when a node that is used as a router stops forwarding packets due to being configured as a host. Currently, when dealing with NA Message which IsRouter flag changes from TRUE to FALSE, the kernel only removes router from the Default Router List, and don't update the Destination Cache entries. Now in order to update those Destination Cache entries, i introduce function rt6_clean_tohost(). Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13net: Use fwmark reflection in PMTU discovery.Lorenzo Colitti
Currently, routing lookups used for Path PMTU Discovery in absence of a socket or on unmarked sockets use a mark of 0. This causes PMTUD not to work when using routing based on netfilter fwmark mangling and fwmark ip rules, such as: iptables -j MARK --set-mark 17 ip rule add fwmark 17 lookup 100 This patch causes these route lookups to use the fwmark from the received ICMP error when the fwmark_reflect sysctl is enabled. This allows the administrator to make PMTUD work by configuring appropriate fwmark rules to mark the inbound ICMP packets. Black-box tested using user-mode linux by pointing different fwmarks at routing tables egressing on different interfaces, and using iptables mangling to mark packets inbound on each interface with the interface's fwmark. ICMPv4 and ICMPv6 PMTU discovery work as expected when mark reflection is enabled and fail when it is disabled. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-28net: ipv6: more places need LOOPBACK_IFINDEX for flowi6_iifJulian Anastasov
To properly match iif in ip rules we have to provide LOOPBACK_IFINDEX in flowi6_iif, not 0. Some ip6mr_fib_lookup and fib6_rule_lookup callers need such fix. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15ipv4: add a sock pointer to dst->output() path.Eric Dumazet
In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: 8f646c922d55 ("vxlan: keep original skb ownership") Reported-by: lucien xin <lucien.xin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-14ipv6: Limit mtu to 65575 bytesEric Dumazet
Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31ipv6: reuse rt6_need_strictWang Yufen
Move the whole rt6_need_strict as static inline into ip6_route.h, so that it can be reused Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27ipv6: do not overwrite inetpeer metrics prematurelyMichal Kubeček
If an IPv6 host route with metrics exists, an attempt to add a new route for the same target with different metrics fails but rewrites the metrics anyway: 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000 12sp0:~ # ip -6 route show fe80::/64 dev eth0 proto kernel metric 256 fec0::1 dev eth0 metric 1024 rto_min lock 1s 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500 RTNETLINK answers: File exists 12sp0:~ # ip -6 route show fe80::/64 dev eth0 proto kernel metric 256 fec0::1 dev eth0 metric 1024 rto_min lock 1.5s This is caused by all IPv6 host routes using the metrics in their inetpeer (or the shared default). This also holds for the new route created in ip6_route_add() which shares the metrics with the already existing route and thus ip6_route_add() rewrites the metrics even if the new route ends up not being used at all. Another problem is that old metrics in inetpeer can reappear unexpectedly for a new route, e.g. 12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000 12sp0:~ # ip route del fec0::1 12sp0:~ # ip route add fec0::1 dev eth0 12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10 12sp0:~ # ip -6 route show fe80::/64 dev eth0 proto kernel metric 256 fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s Resolve the first problem by moving the setting of metrics down into fib6_add_rt2node() to the point we are sure we are inserting the new route into the tree. Second problem is addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE which is set for a new host route in ip6_route_add() and makes ipv6_cow_metrics() always overwrite the metrics in inetpeer (even if they are not "new"); it is reset after that. v5: use a flag in _metrics member rather than one in flags v4: fix a typo making a condition always true (thanks to Hannes Frederic Sowa) v3: rewritten based on David Miller's idea to move setting the metrics (and allocation in non-host case) down to the point we already know the route is to be inserted. Also rebased to net-next as it is quite late in the cycle. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>