summaryrefslogtreecommitdiff
path: root/net/openvswitch
AgeCommit message (Collapse)Author
2019-12-05openvswitch: remove another BUG_ON()Paolo Abeni
[ Upstream commit 8a574f86652a4540a2433946ba826ccb87f398cc ] If we can't build the flow del notification, we can simply delete the flow, no need to crash the kernel. Still keep a WARN_ON to preserve debuggability. Note: the BUG_ON() predates the Fixes tag, but this change can be applied only after the mentioned commit. v1 -> v2: - do not leak an skb on error Fixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical section.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()Paolo Abeni
[ Upstream commit 8ffeb03fbba3b599690b361467bfd2373e8c450f ] All the callers of ovs_flow_cmd_build_info() already deal with error return code correctly, so we can handle the error condition in a more gracefull way. Still dump a warning to preserve debuggability. v1 -> v2: - clarify the commit message - clean the skb and report the error (DaveM) Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05openvswitch: fix flow command message sizePaolo Abeni
[ Upstream commit 4e81c0b3fa93d07653e2415fa71656b080a112fd ] When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant flow has no UFID, we can exceed the computed size, as ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY attribute. Take the above in account when computing the flow command message size. Fixes: 74ed7ab9264c ("openvswitch: Add support for unique flow IDs.") Reported-by: Qi Jun Ding <qding@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-25net: ovs: fix return type of ndo_start_xmit functionYueHaibing
[ Upstream commit eddf11e18dff0e8671e06ce54e64cfc843303ab9 ] The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, so make sure the implementation in this driver has returns 'netdev_tx_t' value, and change the function return type to netdev_tx_t. Found by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05openvswitch: change type of UPCALL_PID attribute to NLA_UNSPECLi RongQing
[ Upstream commit ea8564c865299815095bebeb4b25bef474218e4c ] userspace openvswitch patch "(dpif-linux: Implement the API functions to allow multiple handler threads read upcall)" changes its type from U32 to UNSPEC, but leave the kernel unchanged and after kernel 6e237d099fac "(netlink: Relax attr validation for fixed length types)", this bug is exposed by the below warning [ 57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length. Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's") Signed-off-by: Li RongQing <lirongqing@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27openvswitch: fix flow actions reallocationAndrea Righi
[ Upstream commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb ] The flow action buffer can be resized if it's not big enough to contain all the requested flow actions. However, this resize doesn't take into account the new requested size, the buffer is only increased by a factor of 2x. This might be not enough to contain the new data, causing a buffer overflow, for example: [ 42.044472] ============================================================================= [ 42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten [ 42.046415] ----------------------------------------------------------------------------- [ 42.047715] Disabling lock debugging due to kernel taint [ 42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc [ 42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101 [ 42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb [ 42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc ........ [ 42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00 kkkkkkkk....l... [ 42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6 l...........x... [ 42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00 ............... [ 42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.059061] Redzone 8bf2c4a5: 00 00 00 00 .... [ 42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ Fix by making sure the new buffer is properly resized to contain all the requested data. BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06openvswitch: Avoid OOB read when parsing flow nlattrsRoss Lagerwall
[ Upstream commit 04a4af334b971814eedf4e4a413343ad3287d9a9 ] For nested and variable attributes, the expected length of an attribute is not known and marked by a negative number. This results in an OOB read when the expected length is later used to check if the attribute is all zeros. Fix this by using the actual length of the attribute rather than the expected length. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is foundStefano Brivio
[ Upstream commit 72f17baf2352ded6a1d3f4bb2d15da8c678cd2cb ] If an OVS_ATTR_NESTED attribute type is found while walking through netlink attributes, we call nlattr_set() recursively passing the length table for the following nested attributes, if different from the current one. However, once we're done with those sub-nested attributes, we should continue walking through attributes using the current table, instead of using the one related to the sub-nested attributes. For example, given this sequence: 1 OVS_KEY_ATTR_PRIORITY 2 OVS_KEY_ATTR_TUNNEL 3 OVS_TUNNEL_KEY_ATTR_ID 4 OVS_TUNNEL_KEY_ATTR_IPV4_SRC 5 OVS_TUNNEL_KEY_ATTR_IPV4_DST 6 OVS_TUNNEL_KEY_ATTR_TTL 7 OVS_TUNNEL_KEY_ATTR_TP_SRC 8 OVS_TUNNEL_KEY_ATTR_TP_DST 9 OVS_KEY_ATTR_IN_PORT 10 OVS_KEY_ATTR_SKB_MARK 11 OVS_KEY_ATTR_MPLS we switch to the 'ovs_tunnel_key_lens' table on attribute #3, and we don't switch back to 'ovs_key_lens' while setting attributes #9 to #11 in the sequence. As OVS_KEY_ATTR_MPLS evaluates to 21, and the array size of 'ovs_tunnel_key_lens' is 15, we also get this kind of KASan splat while accessing the wrong table: [ 7654.586496] ================================================================== [ 7654.594573] BUG: KASAN: global-out-of-bounds in nlattr_set+0x164/0xde9 [openvswitch] [ 7654.603214] Read of size 4 at addr ffffffffc169ecf0 by task handler29/87430 [ 7654.610983] [ 7654.612644] CPU: 21 PID: 87430 Comm: handler29 Kdump: loaded Not tainted 3.10.0-866.el7.test.x86_64 #1 [ 7654.623030] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 [ 7654.631379] Call Trace: [ 7654.634108] [<ffffffffb65a7c50>] dump_stack+0x19/0x1b [ 7654.639843] [<ffffffffb53ff373>] print_address_description+0x33/0x290 [ 7654.647129] [<ffffffffc169b37b>] ? nlattr_set+0x164/0xde9 [openvswitch] [ 7654.654607] [<ffffffffb53ff812>] kasan_report.part.3+0x242/0x330 [ 7654.661406] [<ffffffffb53ff9b4>] __asan_report_load4_noabort+0x34/0x40 [ 7654.668789] [<ffffffffc169b37b>] nlattr_set+0x164/0xde9 [openvswitch] [ 7654.676076] [<ffffffffc167ef68>] ovs_nla_get_match+0x10c8/0x1900 [openvswitch] [ 7654.684234] [<ffffffffb61e9cc8>] ? genl_rcv+0x28/0x40 [ 7654.689968] [<ffffffffb61e7733>] ? netlink_unicast+0x3f3/0x590 [ 7654.696574] [<ffffffffc167dea0>] ? ovs_nla_put_tunnel_info+0xb0/0xb0 [openvswitch] [ 7654.705122] [<ffffffffb4f41b50>] ? unwind_get_return_address+0xb0/0xb0 [ 7654.712503] [<ffffffffb65d9355>] ? system_call_fastpath+0x1c/0x21 [ 7654.719401] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370 [ 7654.726298] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370 [ 7654.733195] [<ffffffffb53fe4b5>] ? kasan_unpoison_shadow+0x35/0x50 [ 7654.740187] [<ffffffffb53fe62a>] ? kasan_kmalloc+0xaa/0xe0 [ 7654.746406] [<ffffffffb53fec32>] ? kasan_slab_alloc+0x12/0x20 [ 7654.752914] [<ffffffffb53fe711>] ? memset+0x31/0x40 [ 7654.758456] [<ffffffffc165bf92>] ovs_flow_cmd_new+0x2b2/0xf00 [openvswitch] [snip] [ 7655.132484] The buggy address belongs to the variable: [ 7655.138226] ovs_tunnel_key_lens+0xf0/0xffffffffffffd400 [openvswitch] [ 7655.145507] [ 7655.147166] Memory state around the buggy address: [ 7655.152514] ffffffffc169eb80: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa [ 7655.160585] ffffffffc169ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7655.168644] >ffffffffc169ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa [ 7655.176701] ^ [ 7655.184372] ffffffffc169ed00: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 05 [ 7655.192431] ffffffffc169ed80: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 [ 7655.200490] ================================================================== Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: 982b52700482 ("openvswitch: Fix mask generation for nested attributes.") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24openvswitch: Delete conntrack entry clashing with an expectation.Jarno Rajahalme
[ Upstream commit cf5d70918877c6a6655dc1e92e2ebb661ce904fd ] Conntrack helpers do not check for a potentially clashing conntrack entry when creating a new expectation. Also, nf_conntrack_in() will check expectations (via init_conntrack()) only if a conntrack entry can not be found. The expectation for a packet which also matches an existing conntrack entry will not be removed by conntrack, and is currently handled inconsistently by OVS, as OVS expects the expectation to be removed when the connection tracking entry matching that expectation is confirmed. It should be noted that normally an IP stack would not allow reuse of a 5-tuple of an old (possibly lingering) connection for a new data connection, so this is somewhat unlikely corner case. However, it is possible that a misbehaving source could cause conntrack entries be created that could then interfere with new related connections. Fix this in the OVS module by deleting the clashing conntrack entry after an expectation has been matched. This causes the following nf_conntrack_in() call also find the expectation and remove it when creating the new conntrack entry, as well as the forthcoming reply direction packets to match the new related connection instead of the old clashing conntrack entry. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-03openvswitch: fix the incorrect flow action alloc sizezhangliping
[ Upstream commit 67c8d22a73128ff910e2287567132530abcf5b71 ] If we want to add a datapath flow, which has more than 500 vxlan outputs' action, we will get the following error reports: openvswitch: netlink: Flow action size 32832 bytes exceeds max openvswitch: netlink: Flow action size 32832 bytes exceeds max openvswitch: netlink: Actions may not be safe on all matching packets ... ... It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but this is not the root cause. For example, for a vxlan output action, we need about 60 bytes for the nlattr, but after it is converted to the flow action, it only occupies 24 bytes. This means that we can still support more than 1000 vxlan output actions for a single datapath flow under the the current 32k max limitation. So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we shouldn't report EINVAL and keep it move on, as the judgement can be done by the reserve_sfa_size. Signed-off-by: zhangliping <zhangliping02@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11openvswitch: fix potential out of bound access in parse_ctLiping Zhang
[ Upstream commit 69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c ] Before the 'type' is validated, we shouldn't use it to fetch the ovs_ct_attr_lens's minlen and maxlen, else, out of bound access may happen. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30net/openvswitch: Set the ipv6 source tunnel key address attribute correctlyOr Gerlitz
[ Upstream commit 3d20f1f7bd575d147ffa75621fa560eea0aec690 ] When dealing with ipv6 source tunnel key address attribute (OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel dst ip, fix that. Fixes: 6b26ba3a7d95 ('openvswitch: netlink attributes for IPv6 tunneling') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-04openvswitch: maintain correct checksum state in conntrack actionsLance Richardson
[ Upstream commit 75f01a4c9cc291ff5cb28ca1216adb163b7a20ee ] When executing conntrack actions on skbuffs with checksum mode CHECKSUM_COMPLETE, the checksum must be updated to account for header pushes and pulls. Otherwise we get "hw csum failure" logs similar to this (ICMP packet received on geneve tunnel via ixgbe NIC): [ 405.740065] genev_sys_6081: hw csum failure [ 405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G I 4.10.0-rc3+ #1 [ 405.740108] Call Trace: [ 405.740110] <IRQ> [ 405.740113] dump_stack+0x63/0x87 [ 405.740116] netdev_rx_csum_fault+0x3a/0x40 [ 405.740118] __skb_checksum_complete+0xcf/0xe0 [ 405.740120] nf_ip_checksum+0xc8/0xf0 [ 405.740124] icmp_error+0x1de/0x351 [nf_conntrack_ipv4] [ 405.740132] nf_conntrack_in+0xe1/0x550 [nf_conntrack] [ 405.740137] ? find_bucket.isra.2+0x62/0x70 [openvswitch] [ 405.740143] __ovs_ct_lookup+0x95/0x980 [openvswitch] [ 405.740145] ? netif_rx_internal+0x44/0x110 [ 405.740149] ovs_ct_execute+0x147/0x4b0 [openvswitch] [ 405.740153] do_execute_actions+0x22e/0xa70 [openvswitch] [ 405.740157] ovs_execute_actions+0x40/0x120 [openvswitch] [ 405.740161] ovs_dp_process_packet+0x84/0x120 [openvswitch] [ 405.740166] ovs_vport_receive+0x73/0xd0 [openvswitch] [ 405.740168] ? udp_rcv+0x1a/0x20 [ 405.740170] ? ip_local_deliver_finish+0x93/0x1e0 [ 405.740172] ? ip_local_deliver+0x6f/0xe0 [ 405.740174] ? ip_rcv_finish+0x3a0/0x3a0 [ 405.740176] ? ip_rcv_finish+0xdb/0x3a0 [ 405.740177] ? ip_rcv+0x2a7/0x400 [ 405.740180] ? __netif_receive_skb_core+0x970/0xa00 [ 405.740185] netdev_frame_hook+0xd3/0x160 [openvswitch] [ 405.740187] __netif_receive_skb_core+0x1dc/0xa00 [ 405.740194] ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe] [ 405.740197] __netif_receive_skb+0x18/0x60 [ 405.740199] netif_receive_skb_internal+0x40/0xb0 [ 405.740201] napi_gro_receive+0xcd/0x120 [ 405.740204] gro_cell_poll+0x57/0x80 [geneve] [ 405.740206] net_rx_action+0x260/0x3c0 [ 405.740209] __do_softirq+0xc9/0x28c [ 405.740211] irq_exit+0xd9/0xf0 [ 405.740213] do_IRQ+0x51/0xd0 [ 405.740215] common_interrupt+0x93/0x93 Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devicesDavid Wragg
[ Upstream commit 7e059158d57b79159eaf1f504825d19866ef2c42 ] Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg <david@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18openvswitch: use flow protocol when recalculating ipv6 checksumsSimon Horman
[ Upstream commit b4f70527f052b0c00be4d7cac562baa75b212df5 ] When using masked actions the ipv6_proto field of an action to set IPv6 fields may be zero rather than the prevailing protocol which will result in skipping checksum recalculation. This patch resolves the problem by relying on the protocol in the flow key rather than that in the set field action. Fixes: 83d2b9ba1abc ("net: openvswitch: Support masked set actions.") Cc: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-18net: use skb_postpush_rcsum instead of own implementationsDaniel Borkmann
[ Upstream commit 6b83d28a55a891a9d70fc61ccb1c138e47dcbe74 ] Replace individual implementations with the recently introduced skb_postpush_rcsum() helper. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tom Herbert <tom@herbertland.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03lwt: fix rx checksum setting for lwt devices tunneling over ipv6Paolo Abeni
[ Upstream commit c868ee7063bdb53f3ef9eac7bcec84960980b471 ] the commit 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be correctly controlled.") changed the default xmit checksum setting for lwt vxlan/geneve ipv6 tunnels, so that now the checksum is not set into external UDP header. This commit changes the rx checksum setting for both lwt vxlan/geneve devices created by openvswitch accordingly, so that lwt over ipv6 tunnel pairs are again able to communicate with default values. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31net: preserve IP control block during GSO segmentationKonstantin Khlebnikov
[ Upstream commit 9207f9d45b0ad071baa128e846d7e7ed85016df3 ] Skb_gso_segment() uses skb control block during segmentation. This patch adds 32-bytes room for previous control block which will be copied into all resulting segments. This patch fixes kernel crash during fragmenting forwarded packets. Fragmentation requires valid IP CB in skb for clearing ip options. Also patch removes custom save/restore in ovs code, now it's redundant. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-29openvswitch: Fix template leak in error cases.Joe Stringer
Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a reference leak on helper objects, but inadvertently introduced a leak on the ct template. Previously, ct_info.ct->general.use was initialized to 0 by nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action() returned successful. If an error occurred while adding the helper or adding the action to the actions buffer, the __ovs_ct_free_action() cleanup would use nf_ct_put() to free the entry; However, this relies on atomic_dec_and_test(ct_info.ct->general.use). This reference must be incremented first, or nf_ct_put() will never free it. Fix the issue by acquiring a reference to the template immediately after allocation. Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18openvswitch: correct encoding of set tunnel action attributesSimon Horman
In a set action tunnel attributes should be encoded in a nested action. I noticed this because ovs-dpctl was reporting an error when dumping flows due to the incorrect encoding of tunnel attributes in a set action. Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.") Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11openvswitch: Respect conntrack zone even if invalidJoe Stringer
If userspace executes ct(zone=1), and the connection tracker determines that the packet is invalid, then the ct_zone flow key field is populated with the default zone rather than the zone that was specified. Even though connection tracking failed, this field should be updated with the value that the action specified. Fix the issue. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11openvswitch: Fix helper reference leakJoe Stringer
If the actions (re)allocation fails, or the actions list is larger than the maximum size, and the conntrack action is the last action when these problems are hit, then references to helper modules may be leaked. Fix the issue. Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03openvswitch: fix hangup on vxlan/gre/geneve device deletionPaolo Abeni
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference to the underlying tunnel device, but never released it when such device is deleted. Deleting the underlying device via the ip tool cause the kernel to hangup in the netdev_wait_allrefs() loop. This commit ensure that on device unregistration dp_detach_port_notify() is called for all vports that hold the device reference, properly releasing it. Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport") Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02openvswitch: properly refcount vport-vxlan modulePaolo Abeni
After 614732eaa12d, no refcount is maintained for the vport-vxlan module. This allows the userspace to remove such module while vport-vxlan devices still exist, which leads to later oops. v1 -> v2: - move vport 'owner' initialization in ovs_vport_ops_register() and make such function a macro Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24net: openvswitch: Remove invalid commentAaron Conole
During pre-upstream development, the openvswitch datapath used a custom hashtable to store vports that could fail on delete due to lack of memory. However, prior to upstream submission, this code was reworked to use an hlist based hastable with flexible-array based buckets. As such the failure condition was eliminated from the vport_del path, rendering this comment invalid. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "Trivial stuff from trivial tree that can be trivially summed up as: - treewide drop of spurious unlikely() before IS_ERR() from Viresh Kumar - cosmetic fixes (that don't really affect basic functionality of the driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek - various comment / printk fixes and updates all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: bcache: Really show state of work pending bit hwmon: applesmc: fix comment typos Kconfig: remove comment about scsi_wait_scan module class_find_device: fix reference to argument "match" debugfs: document that debugfs_remove*() accepts NULL and error values net: Drop unlikely before IS_ERR(_OR_NULL) mm: Drop unlikely before IS_ERR(_OR_NULL) fs: Drop unlikely before IS_ERR(_OR_NULL) drivers: net: Drop unlikely before IS_ERR(_OR_NULL) drivers: misc: Drop unlikely before IS_ERR(_OR_NULL) UBI: Update comments to reflect UBI_METAONLY flag pktcdvd: drop null test before destroy functions
2015-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-10-27openvswitch: Fix skb leak using IPv6 defragJoe Stringer
nf_ct_frag6_gather() makes a clone of each skb passed to it, and if the reassembly is successful, expects the caller to free all of the original skbs using nf_ct_frag6_consume_orig(). This call was previously missing, meaning that the original fragments were never freed (with the exception of the last fragment to arrive). Fix this by ensuring that all original fragments except for the last fragment are freed via nf_ct_frag6_consume_orig(). The last fragment will be morphed into the head, so it must not be freed yet. Furthermore, retain the ->next pointer for the head after skb_morph(). Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27openvswitch: Fix double-free on ip_defrag() errorsJoe Stringer
If ip_defrag() returns an error other than -EINPROGRESS, then the skb is freed. When handle_fragments() passes this back up to do_execute_actions(), it will be freed again. Prevent this double free by never freeing the skb in do_execute_actions() for errors returned by ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22openvswitch: Fix egress tunnel info.Pravin B Shelar
While transitioning to netdev based vport we broke OVS feature which allows user to retrieve tunnel packet egress information for lwtunnel devices. Following patch fixes it by introducing ndo operation to get the tunnel egress info. Same ndo operation can be used for lwtunnel devices and compat ovs-tnl-vport devices. So after adding such device operation we can remove similar operation from ovs-vport. Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22openvswitch: Use dev_queue_xmit for vport send.Pravin B Shelar
With use of lwtunnel, we can directly call dev_queue_xmit() rather than calling netdev vport send operation. Following change make tunnel vport code bit cleaner. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22openvswitch: Fix incorrect type use.Pravin B Shelar
Patch fixes following sparse warning. net/openvswitch/flow_netlink.c:583:30: warning: incorrect type in assignment (different base types) net/openvswitch/flow_netlink.c:583:30: expected restricted __be16 [usertype] ipv4 net/openvswitch/flow_netlink.c:583:30: got int Fixes: 6b26ba3a7d ("openvswitch: netlink attributes for IPv6 tunneling") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21openvswitch: Serialize nested ct actions if providedJoe Stringer
If userspace provides a ct action with no nested mark or label, then the storage for these fields is zeroed. Later when actions are requested, such zeroed fields are serialized even though userspace didn't originally specify them. Fix the behaviour by ensuring that no action is serialized in this case, and reject actions where userspace attempts to set these fields with mask=0. This should make netlink marshalling consistent across deserialization/reserialization. Reported-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21openvswitch: Mark connections new when not confirmed.Joe Stringer
New, related connections are marked as such as part of ovs_ct_lookup(), but they are not marked as "new" if the commit flag is used. Make this consistent by setting the "new" flag whenever !nf_ct_is_confirmed(ct). Reported-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21openvswitch: Reject ct_state masks for unknown bitsJoe Stringer
Currently, 0-bits are generated in ct_state where the bit position is undefined, and matches are accepted on these bit-positions. If userspace requests to match the 0-value for this bit then it may expect only a subset of traffic to match this value, whereas currently all packets will have this bit set to 0. Fix this by rejecting such masks. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21openvswitch: Allocate memory for ovs internal device stats.James Morse
"openvswitch: Remove vport stats" removed the per-vport statistics, in order to use the netdev's statistics fields. "openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats to user-space, by using the provided netdev_ops to collate them - but ovs internal devices still use an unallocated dev->tstats field to count packets, which are no longer exported by this api. Allocate the dev->tstats field for ovs internal devices, and wire up ndo_get_stats64 with the original implementation of ovs_vport_get_stats(). On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs, unmasking a full-on panic on arm64: =============%<============== [<ffffffbffc00ce4c>] internal_dev_recv+0xa8/0x170 [openvswitch] [<ffffffbffc0008b4>] do_output.isra.31+0x60/0x19c [openvswitch] [<ffffffbffc000bf8>] do_execute_actions+0x208/0x11c0 [openvswitch] [<ffffffbffc001c78>] ovs_execute_actions+0xc8/0x238 [openvswitch] [<ffffffbffc003dfc>] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch] [<ffffffc0005e8c5c>] genl_family_rcv_msg+0x1b0/0x310 [<ffffffc0005e8e60>] genl_rcv_msg+0xa4/0xe4 [<ffffffc0005e7ddc>] netlink_rcv_skb+0xb0/0xdc [<ffffffc0005e8a94>] genl_rcv+0x38/0x50 [<ffffffc0005e76c0>] netlink_unicast+0x164/0x210 [<ffffffc0005e7b70>] netlink_sendmsg+0x304/0x368 [<ffffffc0005a21c0>] sock_sendmsg+0x30/0x4c [SNIP] Kernel panic - not syncing: Fatal exception in interrupt =============%<============== Fixes: 8c876639c985 ("openvswitch: Remove vport stats.") Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/usb/asix_common.c net/ipv4/inet_connection_sock.c net/switchdev/switchdev.c In the inet_connection_sock.c case the request socket hashing scheme is completely different in net-next. The other two conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18openvswitch: Scrub skb between namespacesJoe Stringer
If OVS receives a packet from another namespace, then the packet should be scrubbed. However, people have already begun to rely on the behaviour that skb->mark is preserved across namespaces, so retain this one field. This is mainly to address information leakage between namespaces when using OVS internal ports, but by placing it in ovs_vport_receive() it is more generally applicable, meaning it should not be overlooked if other port types are allowed to be moved into namespaces in future. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12ipv6: Pass struct net into nf_ct_frag6_gatherEric W. Biederman
The function nf_ct_frag6_gather is called on both the input and the output paths of the networking stack. In particular ipv6_defrag which calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain on input and the LOCAL_OUT chain on output. The addition of a net parameter makes it explicit which network namespace the packets are being reassembled in, and removes the need for nf_ct_frag6_gather to guess. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12ipv4: Pass struct net into ip_defrag and ip_check_defragEric W. Biederman
The function ip_defrag is called on both the input and the output paths of the networking stack. In particular conntrack when it is tracking outbound packets from the local machine calls ip_defrag. So add a struct net parameter and stop making ip_defrag guess which network namespace it needs to defragment packets in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMITJoe Stringer
Previously, the CT_ATTR_FLAGS attribute, when nested under the OVS_ACTION_ATTR_CT, encoded a 32-bit bitmask of flags that modify the semantics of the ct action. It's more extensible to just represent each flag as a nested attribute, and this requires no additional error checking to reject flags that aren't currently supported. Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Extend ct_state match field to 32 bitsJoe Stringer
The ct_state field was initially added as an 8-bit field, however six of the bits are already being used and use cases are already starting to appear that may push the limits of this field. This patch extends the field to 32 bits while retaining the internal representation of 8 bits. This should cover forward compatibility of the ABI for the foreseeable future. This patch also reorders the OVS_CS_F_* bits to be sequential. Suggested-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Reject ct_state unsupported bitsJoe Stringer
Previously, if userspace specified ct_state bits in the flow key which are currently undefined (and therefore unsupported), then they would be ignored. This could cause unexpected behaviour in future if userspace is extended to support additional bits but attempts to communicate with the current version of the kernel. This patch rectifies the situation by rejecting such ct_state bits. Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Ensure flow is valid before executing ctJoe Stringer
The ct action uses parts of the flow key, so we need to ensure that it is valid before executing that action. Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Fix skb leak in ovs_fragment()Joe Stringer
If ovs_fragment() was unable to fragment the skb due to an L2 header that exceeds the supported length, skbs would be leaked. Fix the bug. Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: netlink attributes for IPv6 tunnelingJiri Benc
Add netlink attributes for IPv6 tunnel addresses. This enables IPv6 support for tunnels. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: add tunnel protocol to sw_flow_keyJiri Benc
Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now also acts as an indicator whether the flow contains tunnel data (this was previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses in an union with IPv4 ones this won't work anymore). The new field was added to a hole in sw_flow_key. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05openvswitch: Fix ovs_vport_get_stats()Pravin B Shelar
Not every device has dev->tstats set. So when OVS tries to calculate vport stats it causes kernel panic. Following patch fixes it by using standard API to get net-device stats. ---8<--- Unable to handle kernel paging request at virtual address 766b4008 Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: vport_vxlan vxlan ip6_udp_tunnel udp_tunnel tun bridge stp llc openvswitch ipv6 CPU: 7 PID: 1108 Comm: ovs-vswitchd Not tainted 4.3.0-rc3+ #82 PC is at ovs_vport_get_stats+0x150/0x1f8 [openvswitch] <snip> Call trace: [<ffffffbffc0859f8>] ovs_vport_get_stats+0x150/0x1f8 [openvswitch] [<ffffffbffc07cdb0>] ovs_vport_cmd_fill_info+0x140/0x1e0 [openvswitch] [<ffffffbffc07cf0c>] ovs_vport_cmd_dump+0xbc/0x138 [openvswitch] [<ffffffc00045a5ac>] netlink_dump+0xb8/0x258 [<ffffffc00045ace0>] __netlink_dump_start+0x120/0x178 [<ffffffc00045dd9c>] genl_family_rcv_msg+0x2d4/0x308 [<ffffffc00045de58>] genl_rcv_msg+0x88/0xc4 [<ffffffc00045cf24>] netlink_rcv_skb+0xd4/0x100 [<ffffffc00045dab0>] genl_rcv+0x30/0x48 [<ffffffc00045c830>] netlink_unicast+0x154/0x200 [<ffffffc00045cc9c>] netlink_sendmsg+0x308/0x364 [<ffffffc00041e10c>] sock_sendmsg+0x14/0x2c [<ffffffc000420d58>] SyS_sendto+0xbc/0xf0 Code: aa1603e1 f94037a4 aa1303e2 aa1703e0 (f9400465) Reported-by: Tomasz Sawicki <tomasz.sawicki@objectiveintegration.uk> Fixes: 8c876639c98 ("openvswitch: Remove vport stats.") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05ovs: do not allocate memory from offline numa nodeKonstantin Khlebnikov
When openvswitch tries allocate memory from offline numa node 0: stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0) It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h This patch disables numa affinity in this case. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>