summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
AgeCommit message (Collapse)Author
2019-08-25bnx2x: Fix VF's VLAN reconfiguration in reload.Manish Chopra
[ Upstream commit 4a4d2d372fb9b9229327e2ed01d5d9572eddf4de ] Commit 04f05230c5c13 ("bnx2x: Remove configured vlans as part of unload sequence."), introduced a regression in driver that as a part of VF's reload flow, VLANs created on the VF doesn't get re-configured in hardware as vlan metadata/info was not getting cleared for the VFs which causes vlan PING to stop. This patch clears the vlan metadata/info so that VLANs gets re-configured back in the hardware in VF's reload flow and PING/traffic continues for VLANs created over the VFs. Fixes: 04f05230c5c13 ("bnx2x: Remove configured vlans as part of unload sequence.") Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com> Signed-off-by: Shahed Shaikh <shshaikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09bnx2x: Disable multi-cos feature.Sudarsana Reddy Kalluru
[ Upstream commit d1f0b5dce8fda09a7f5f04c1878f181d548e42f5 ] Commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") which enabled multi-cos feature after prolonged time in driver added some regression causing numerous issues (sudden reboots, tx timeout etc.) reported by customers. We plan to backout this commit and submit proper fix once we have root cause of issues reported with this feature enabled. Fixes: 3968d38917eb ("bnx2x: Fix Multi-Cos.") Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-31bnx2x: Prevent ptp_task to be rescheduled indefinitelyGuilherme G. Piccoli
[ Upstream commit 3c91f25c2f72ba6001775a5932857c1d2131c531 ] Currently bnx2x ptp worker tries to read a register with timestamp information in case of TX packet timestamping and in case it fails, the routine reschedules itself indefinitely. This was reported as a kworker always at 100% of CPU usage, which was narrowed down to be bnx2x ptp_task. By following the ioctl handler, we could narrow down the problem to an NTP tool (chrony) requesting HW timestamping from bnx2x NIC with RX filter zeroed; this isn't reproducible for example with ptp4l (from linuxptp) since this tool requests a supported RX filter. It seems NIC FW timestamp mechanism cannot work well with RX_FILTER_NONE - driver's PTP filter init routine skips a register write to the adapter if there's not a supported filter request. This patch addresses the problem of bnx2x ptp thread's everlasting reschedule by retrying the register read 10 times; between the read attempts the thread sleeps for an increasing amount of time starting in 1ms to give FW some time to perform the timestamping. If it still fails after all retries, we bail out in order to prevent an unbound resource consumption from bnx2x. The patch also adds an ethtool statistic for accounting the skipped TX timestamp packets and it reduces the priority of timestamping error messages to prevent log flooding. The code was tested using both linuxptp and chrony. Reported-and-tested-by: Przemyslaw Hausman <przemyslaw.hausman@canonical.com> Suggested-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-31bnx2x: Prevent load reordering in tx completion processingBrian King
[ Upstream commit ea811b795df24644a8eb760b493c43fba4450677 ] This patch fixes an issue seen on Power systems with bnx2x which results in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons load in bnx2x_tx_int. Adding a read memory barrier resolves the issue. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24bnx2x: Fix receiving tx-timeout in error or recovery state.Sudarsana Reddy Kalluru
[ Upstream commit 484c016d9392786ce5c74017c206c706f29f823d ] Driver performs the internal reload when it receives tx-timeout event from the OS. Internal reload might fail in some scenarios e.g., fatal HW issues. In such cases OS still see the link, which would result in undesirable functionalities such as re-generation of tx-timeouts. The patch addresses this issue by indicating the link-down to OS when tx-timeout is detected, and keeping the link in down state till the internal reload is successful. Please consider applying it to 'net' branch. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03bnx2x: Improve reliability in case of nested PCI errorsGuilherme G. Piccoli
[ Upstream commit f7084059a9cb9e56a186e1677b1dcffd76c2cd24 ] While in recovery process of PCI error (called EEH on PowerPC arch), another PCI transaction could be corrupted causing a situation of nested PCI errors. Also, this scenario could be reproduced with error injection mechanisms (for debug purposes). We observe that in case of nested PCI errors, bnx2x might attempt to initialize its shmem and cause a kernel crash due to bad addresses read from MCP. Multiple different stack traces were observed depending on the point the second PCI error happens. This patch avoids the crashes by: * failing PCI recovery in case of nested errors (since multiple PCI errors in a row are not expected to lead to a functional adapter anyway), and by, * preventing access to adapter FW when MCP is failed (we mark it as failed when shmem cannot get initialized properly). Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Acked-by: Shahed Shaikh <Shahed.Shaikh@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07net: sched: get rid of struct tc_to_netdevJiri Pirko
Get rid of struct tc_to_netdev which is now just unnecessary container and rather pass per-type structures down to drivers directly. Along with that, consolidate the naming of per-type structure variables in cls_*. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: change return value of ndo_setup_tc for driver supporting mqprio ↵Jiri Pirko
only Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the drivers have it like that, so be aligned. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: push cls related args into cls_common structureJiri Pirko
As ndo_setup_tc is generic offload op for whole tc subsystem, does not really make sense to have cls-specific args. So move them under cls_common structurure which is embedded in all cls structs. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: make type an argument for ndo_setup_tcJiri Pirko
Since the type is always present, push it to be a separate argument to ndo_setup_tc. On the way, name the type enum and use it for arg type. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-10bnx2x: Allow vfs to disable txvlan offloadMintz, Yuval
VF clients are configured as enforced, meaning firmware is validating the correctness of their ethertype/vid during transmission. Once txvlan is disabled, VF would start getting SKBs for transmission here vlan is on the payload - but it'll pass the packet's ethertype instead of the vid, leading to firmware declaring it as malicious. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08net: propagate tc filter chain index down the ndo_setup_tc callJiri Pirko
We need to push the chain index down to the drivers, so they have the information to which chain the rule belongs. For now, no driver supports multichain offload, so only chain 0 is supported. This is needed to prevent chain squashes during offload for now. Later this will be used to implement multichain offload. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01bnx2x: Fix Multi-CosMintz, Yuval
Apparently multi-cos isn't working for bnx2x quite some time - driver implements ndo_select_queue() to allow queue-selection for FCoE, but the regular L2 flow would cause it to modulo the fallback's result by the number of queues. The fallback would return a queue matching the needed tc [via __skb_tx_hash()], but since the modulo is by the number of TSS queues where number of TCs is not accounted, transmission would always be done by a queue configured into using TC0. Fixes: ada7c19e6d27 ("bnx2x: use XPS if possible for bnx2x_select_queue instead of pure hash") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30bnx2x: Align RX buffersScott Wood
The bnx2x driver is not providing proper alignment on the receive buffers it passes to build_skb(), causing skb_shared_info to be misaligned. skb_shared_info contains an atomic, and while PPC normally supports unaligned accesses, it does not support unaligned atomics. Aligning the size of rx buffers will ensure that page_frag_alloc() returns aligned addresses. This can be reproduced on PPC by setting the network MTU to 1450 (or other non-multiple-of-4) and then generating sufficient inbound network traffic (one or two large "wget"s usually does it), producing the following oops: Unable to handle kernel paging request for unaligned access at address 0xc00000ffc43af656 Faulting instruction address: 0xc00000000080ef8c Oops: Kernel access of bad area, sig: 7 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: vmx_crypto powernv_rng rng_core powernv_op_panel leds_powernv led_class nfsd ip_tables x_tables autofs4 xfs lpfc bnx2x mdio libcrc32c crc_t10dif crct10dif_generic crct10dif_common CPU: 104 PID: 0 Comm: swapper/104 Not tainted 4.11.0-rc8-00088-g4c761da #2 task: c00000ffd4892400 task.stack: c00000ffd4920000 NIP: c00000000080ef8c LR: c00000000080eee8 CTR: c0000000001f8320 REGS: c00000ffffc33710 TRAP: 0600 Not tainted (4.11.0-rc8-00088-g4c761da) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24082042 XER: 00000000 CFAR: c00000000080eea0 DAR: c00000ffc43af656 DSISR: 00000000 SOFTE: 1 GPR00: c000000000907f64 c00000ffffc33990 c000000000dd3b00 c00000ffcaf22100 GPR04: c00000ffcaf22e00 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000b80008 c00000ffc43af636 c00000ffc43af656 0000000000000000 GPR12: c0000000001f6f00 c00000000fe1a000 000000000000049f 000000000000c51f GPR16: 00000000ffffef33 0000000000000000 0000000000008a43 0000000000000001 GPR20: c00000ffc58a90c0 0000000000000000 000000000000dd86 0000000000000000 GPR24: c000007fd0ed10c0 00000000ffffffff 0000000000000158 000000000000014a GPR28: c00000ffc43af010 c00000ffc9144000 c00000ffcaf22e00 c00000ffcaf22100 NIP [c00000000080ef8c] __skb_clone+0xdc/0x140 LR [c00000000080eee8] __skb_clone+0x38/0x140 Call Trace: [c00000ffffc33990] [c00000000080fb74] skb_clone+0x74/0x110 (unreliable) [c00000ffffc339c0] [c000000000907f64] packet_rcv+0x144/0x510 [c00000ffffc33a40] [c000000000827b64] __netif_receive_skb_core+0x5b4/0xd80 [c00000ffffc33b00] [c00000000082b2bc] netif_receive_skb_internal+0x2c/0xc0 [c00000ffffc33b40] [c00000000082c49c] napi_gro_receive+0x11c/0x260 [c00000ffffc33b80] [d000000066483d68] bnx2x_poll+0xcf8/0x17b0 [bnx2x] [c00000ffffc33d00] [c00000000082babc] net_rx_action+0x31c/0x480 [c00000ffffc33e10] [c0000000000d5a44] __do_softirq+0x164/0x3d0 [c00000ffffc33f00] [c0000000000d60a8] irq_exit+0x108/0x120 [c00000ffffc33f20] [c000000000015b98] __do_irq+0x98/0x200 [c00000ffffc33f90] [c000000000027f14] call_do_irq+0x14/0x24 [c00000ffd4923a90] [c000000000015d94] do_IRQ+0x94/0x110 [c00000ffd4923ae0] [c000000000008d90] hardware_interrupt_common+0x150/0x160 Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15mqprio: Modify mqprio to pass user parameters via ndo_setup_tc.Amritha Nambiar
The configurable priority to traffic class mapping and the user specified queue ranges are used to configure the traffic class, overriding the hardware defaults when the 'hw' option is set to 0. However, when the 'hw' option is non-zero, the hardware QOS defaults are used. This patch makes it so that we can pass the data the user provided to ndo_setup_tc. This allows us to pull in the queue configuration if the user requested it as well as any additional hardware offload type requested by using a value other than 1 for the hw value. Finally it also provides a means for the device driver to return the level supported for the offload type via the qopt->hw value. Previously we were just always assuming the value to be 1, in the future values beyond just 1 may be supported. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30drivers: net: generalize napi_complete_done()Eric Dumazet
napi_complete_done() allows to opt-in for gro_flush_timeout, added back in linux-3.19, commit 3b47d30396ba ("net: gro: add a per device gro flush timer") This allows for more efficient GRO aggregation without sacrifying latencies. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-23bnx2x: avoid two atomic ops per page on x86Eric Dumazet
Commit 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element") added extra put_page() and get_page() calls on arches where PAGE_SIZE=4K like x86 Reorder things to avoid this overhead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03bnx2x: use reset to set network headerZhang Shengju
Since offset is zero, it's not necessary to use set function. Reset function is straightforward, and will remove the unnecessary add operation in set function. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16bnx2x: switch to napi_complete_done()Eric Dumazet
Switch from napi_complete() to napi_complete_done() for better GRO support (gro_flush_timeout) and core NAPI features. Do not rearm interrupts if we are busy polling, to reduce bus and interrupts overhead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18ethernet/broadcom: use core min/max MTU checkingJarod Wilson
tg3: min_mtu 60, max_mtu 9000/1500 bnxt: min_mtu 60, max_mtu 9000 bnx2x: min_mtu 46, max_mtu 9600 - Fix up ETH_OVREHEAD -> ETH_OVERHEAD while we're in here, remove duplicated defines from bnx2x_link.c. bnx2: min_mtu 46, max_mtu 9000 - Use more standard ETH_* defines while we're at it. bcm63xx_enet: min_mtu 46, max_mtu 2028 - compute_hw_mtu was made largely pointless, and thus merged back into bcm_enet_change_mtu. b44: min_mtu 60, max_mtu 1500 CC: netdev@vger.kernel.org CC: Michael Chan <michael.chan@broadcom.com> CC: Sony Chacko <sony.chacko@qlogic.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Dept-HSGLinuxNICDev@qlogic.com CC: Siva Reddy Kallam <siva.kallam@broadcom.com> CC: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-16bnx2x: don't wait for Tx completion on recoveryYuval Mintz
When driver has hit a parity event, HW can no longer write to host memory. As a result, Tx completions cannot be written to the host SB memory, and waiting for Tx completions eventually timeout. As driver is willing to delay as much as 1-2 seconds per Tx queue for its draining and this delay is sequential, the time to recover might greatly lengthen needlessly in case the recovery is done under multi-connection traffic. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03net: relax setup_tc ndo op handle restrictionJohn Fastabend
I added this check in setup_tc to multiple drivers, if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO) Unfortunately restricting to TC_H_ROOT like this breaks the old instantiation of mqprio to setup a hardware qdisc. This patch relaxes the test to only check the type to make it equivalent to the check before I broke it. With this the old instantiation continues to work. A good smoke test is to setup mqprio with, # tc qdisc add dev eth4 root mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7 Fixes: e4c6734eaab9 ("net: rework ndo tc op to consume additional qdisc handle paramete") Reported-by: Singh Krishneil <krishneil.k.singh@intel.com> Reported-by: Jake Keller <jacob.e.keller@intel.com> CC: Murali Karicheri <m-karicheri2@ti.com> CC: Shradha Shah <sshah@solarflare.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Bruce Allan <bruce.w.allan@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17net: rework setup_tc ndo op to consume general tc operandJohn Fastabend
This patch updates setup_tc so we can pass additional parameters into the ndo op in a generic way. To do this we provide structured union and type flag. This lets each classifier and qdisc provide its own set of attributes without having to add new ndo ops or grow the signature of the callback. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17net: rework ndo tc op to consume additional qdisc handle parameterJohn Fastabend
The ndo_setup_tc() op was added to support drivers offloading tx qdiscs however only support for mqprio was ever added. So we only ever added support for passing the number of traffic classes to the driver. This patch generalizes the ndo_setup_tc op so that a handle can be provided to indicate if the offload is for ingress or egress or potentially even child qdiscs. CC: Murali Karicheri <m-karicheri2@ti.com> CC: Shradha Shah <sshah@solarflare.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Bruce Allan <bruce.w.allan@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16bnx2x: Remove unneccessary EXPORT_SYMBOLYuval Mintz
bnx2x_schedule_sp_rtnl is exported by bnx2x, although no other module uses it. Reported-by: Benjamin Poirier <bpoirier@suse.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-12-18bnx2x: Prevent FW assertion when using VxlanYuval Mintz
FW has a rare corner case in which a fragmented packet using lots of frags would not be linearized, causing the FW to assert while trying to transmit the packet. To prevent this, we need to make sure the window of fragements containing MSS worth of data contains 1 BD less than for regular packets due to the additional parsing BD. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08bnx2x: remove rx_pkt/rx_callsEric Dumazet
These fields are updated but never read. Remove the overhead. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08bnx2x: avoid soft lockup in bnx2x_poll()Eric Dumazet
Under heavy TX load, bnx2x_poll() can loop forever and trigger soft lockup bugs. A napi poll handler must yield after one TX completion round, risk of livelock is too high otherwise. Bug is very easy to trigger using a debug build, and udp flood, because of added cpu cycles in TX completion, and we do not receive enough packets to break the loop. Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ariel Elior <ariel.elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-05bnx2x: change FW GRO error message to WARN_ONCEMichal Schmidt
It's supposed to be impossible for TPA to give us anything else than IPv4 or IPv6 here. But in case there is a way to reach this error by some strange received frames, we don't want to flood the kernel log. WARN_ONCE is better for this purpose. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-05bnx2x: drop redundant error message about allocation failureMichal Schmidt
alloc_pages() already prints a warning when it fails. No need to emit another message. Certainly not at KERN_ERR level, because it is no big deal if this GFP_ATOMIC allocation fails occasionally. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18net: provide generic busy polling to all NAPI driversEric Dumazet
NAPI drivers no longer need to observe a particular protocol to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y) napi_hash_add() and napi_hash_del() are automatically called from core networking stack, respectively from netif_napi_add() and netif_napi_del() This patch depends on free_netdev() and netif_napi_del() being called from process context, which seems to be the norm. Drivers might still prefer to call napi_hash_del() on their own, since they might combine all the rcu grace periods into a single one, knowing their NAPI structures lifetime, while core networking stack has no idea of a possible combining. Once this patch proves to not bring serious regressions, we will cleanup drivers to either remove napi_hash_del() or provide appropriate rcu grace periods combining. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18net: move skb_mark_napi_id() into core networking stackEric Dumazet
We would like to automatically provide busy polling support to all NAPI drivers, without them having to implement anything. skb_mark_napi_id() can be called from napi_gro_receive() and napi_get_frags(). Few drivers are still calling skb_mark_napi_id() because they use netif_receive_skb(). They should eventually call napi_gro_receive() instead. I will leave this to drivers maintainers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18bnx2x: remove bnx2x_low_latency_recv() supportEric Dumazet
Switch to native NAPI polling, as this reduces overhead and complexity. Normal path is faster, since one cmpxchg() is not anymore requested, and busy polling with the NAPI polling has same performance. Tested: lpk50:~# cat /proc/sys/net/core/busy_read 70 lpk50:~# nstat >/dev/null;./netperf -H lpk55 -t TCP_RR;nstat MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpk55.prod.google.com () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 40095.07 16384 87380 IpInReceives 401062 0.0 IpInDelivers 401062 0.0 IpOutRequests 401079 0.0 TcpActiveOpens 7 0.0 TcpPassiveOpens 3 0.0 TcpAttemptFails 3 0.0 TcpEstabResets 5 0.0 TcpInSegs 401036 0.0 TcpOutSegs 401052 0.0 TcpOutRsts 38 0.0 UdpInDatagrams 26 0.0 UdpOutDatagrams 27 0.0 Ip6OutNoRoutes 1 0.0 TcpExtDelayedACKs 1 0.0 TcpExtTCPPrequeued 98 0.0 TcpExtTCPDirectCopyFromPrequeue 98 0.0 TcpExtTCPHPHits 4 0.0 TcpExtTCPHPHitsToUser 98 0.0 TcpExtTCPPureAcks 5 0.0 TcpExtTCPHPAcks 101 0.0 TcpExtTCPAbortOnData 6 0.0 TcpExtBusyPollRxPackets 400832 0.0 TcpExtTCPOrigDataSent 400983 0.0 IpExtInOctets 21273867 0.0 IpExtOutOctets 21261254 0.0 IpExtInNoECTPkts 401064 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-06mm, page_alloc: distinguish between being unable to sleep, unwilling to ↵Mel Gorman
sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-17bnx2: Fix bandwidth allocation for some MF modesYuval Mintz
Management firmware tells driver in case bandwidth configuration for a specific function exists, but [regretably] the same field has different meanings depending on the multi-function mode - it can either be a percentile value or an actual speed. For newer multi-function modes current logic is incorrect - driver understands values as actual speeds instead of percentages, causing the resulting chip configuration to be incorrect. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/cavium/Kconfig The cavium conflict was overlapping dependency changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10bnx2x: Prevent null pointer dereference on SKB releaseYuval Mintz
On error flows its possible to free an SKB even if it was not allocated. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29bnx2x: add vlan filtering offloadYuval Mintz
Current driver always uses vlan-promisc mode, i.e., it receives both tagged and untagged traffic and lets the network stack drop packets tagged with unrequested vlan tags. This patch implements vlan-filtering offload in the driver - Unless explicitly configured to promisc mode, only untagged packets or packets tagged with requested vlans would reach the Rx flow. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22bnx2x: Add MFW dump supportYuval Mintz
Devices with up-to-date management FW will be able to store register dumps on their persistent storage - in case management FW identifies a fatal error it would gather and store such dumps, which could later be retrieved using specific debug tools. This patch adds the necessary part in the driver in order to make the feature operational, as well as update users [under debug] during load in case their device contains a dump of a previous crash. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22bnx2x: new Multi-function mode - BDYuval Mintz
This adds support to a new multi-function mode, enabling driver to initialize such devices and correctly interacting with management FW for fully utilizing their features. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22bnx2x: Rebrand from 'broadcom' into 'qlogic'Yuval Mintz
bnx2x still appears as a Broadcom driver even though the devices it utilizes belong to Qlogic for more than a year. This patch changes the various headers and the device strings to indicate the correct ownership of the device. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22bnx2x: Utilize FW 7.12.30Yuval Mintz
This moves bnx2x into using 7.12.30 FW. Said firmware fixes the following: - Packets from a VF with pvid configured which were sent with a different vlan were transmitted instead of being discarded. - FCoE traffic might not recover after a failue while there's traffic to another function. In addition, this FW opens the door for the driver to implement several new features; Specifically, this enhances the device's support for encapsulated packets and will allow vxlan/geneve offloads to be added in the future, as well as vlan filtering offload. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-28bnx2x: fix DMA API usageMichal Schmidt
With CONFIG_DMA_API_DEBUG=y bnx2x triggers the error "DMA-API: device driver frees DMA memory with wrong function". On archs where PAGE_SIZE > SGE_PAGE_SIZE it also triggers "DMA-API: device driver frees DMA memory with different size". Fix this by making the mapping and unmapping symmetric: - Do not map the whole pool page at once. Instead map the SGE_PAGE_SIZE-sized pieces individually, so they can be unmapped in the same manner. - What's mapped using dma_map_page() must be unmapped using dma_unmap_page(). Tested on ppc64. Fixes: 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25bnx2x: Fix linearization for encapsulated packetsYuval Mintz
Due to FW constraints, driver must make sure that transmitted SKBs will not be too fragmented, or in the case that they are - that each 'window' of fragments passed to the FW would contain at least an mss worth of data. For encapsultaed packets the calculation is wrong, since it ignores the inner headers in the calculation of the headers' length. This could lead to a FW assertion in case of a too-fragmented encapsulated packet. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01bnx2x: Alloc 4k fragment for each rx ring buffer elementGabriel Krisman Bertazi
The driver allocates one page for each buffer on the rx ring, which is too much on architectures like ppc64 and can cause unexpected allocation failures when the system is under stress. Now, we keep a memory pool per queue, and if the architecture's PAGE_SIZE is greater than 4k, we fragment pages and assign each 4k segment to a ring element, which reduces the overall memory consumption on such architectures. This helps avoiding errors like the example below: [bnx2x_alloc_rx_sge:435(eth1)]Can't alloc sge [c00000037ffeb900] [d000000075eddeb4] .bnx2x_alloc_rx_sge+0x44/0x200 [bnx2x] [c00000037ffeb9b0] [d000000075ee0b34] .bnx2x_fill_frag_skb+0x1ac/0x460 [bnx2x] [c00000037ffebac0] [d000000075ee11f0] .bnx2x_tpa_stop+0x160/0x2e8 [bnx2x] [c00000037ffebb90] [d000000075ee1560] .bnx2x_rx_int+0x1e8/0xc30 [bnx2x] [c00000037ffebcd0] [d000000075ee2084] .bnx2x_poll+0xdc/0x3d8 [bnx2x] (unreliable) Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Four minor merge conflicts: 1) qca_spi.c renamed the local variable used for the SPI device from spi_device to spi, meanwhile the spi_set_drvdata() call got moved further up in the probe function. 2) Two changes were both adding new members to codel params structure, and thus we had overlapping changes to the initializer function. 3) 'net' was making a fix to sk_release_kernel() which is completely removed in 'net-next'. 4) In net_namespace.c, the rtnl_net_fill() call for GET operations had the command value fixed, meanwhile 'net-next' adjusted the argument signature a bit. This also matches example merge resolutions posted by Stephen Rothwell over the past two days. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12bnx2x, tg3: Replace put_page(virt_to_head_page()) with skb_free_frag()Alexander Duyck
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04bnx2x: Fix to prevent inner-reloadYuval Mintz
Submit 909d9faae2a44 ("bnx2x: Prevent inner-reload while VFs exist") contained a bug - MTU change was not prevented by it; Instead, it `randomally' prevented bnx2x_resume() from running [harmless yet wrong]. This moves the check to its correct spot. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>