Age | Commit message (Collapse) | Author |
|
Because for now the xt_qtaguid module allows procs to use tags without
having /dev/xt_qtaguid open, there was a case where it would try
to delete a resources from a list that was proc specific.
But that resource was never added to that list which is only
used when /dev/xt_qtaguid has been opened by the proc.
Once our userspace is fully updated, we won't need those exceptions.
Change-Id: Idd4bfea926627190c74645142916e10832eb2504
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
Change-Id: I3bf165c31f35a6c7dc212f23df5eefaeb8129d0d
Signed-off-by: Ashish Sharma <ashishsharma@google.com>
|
|
In cases where the skb would have an sk_socket but no file, that skb
would not be counted at all. Assigning to uid 0 now.
Adding extra counters to track skb counts.
Change-Id: If049b4b525e1fbd5afc9c72b4a174c0a435f2ca7
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
* Crash fix
The delete command would delete a socket tag entry without removing it
from the proc_qtu_data { ..., sock_tag_list, }.
This in turn would cause an exiting process to crash while cleaning up
its matching proc_qtu_data.
* Added more aggressive tracking/cleanup of proc_qtu_data
This should allow one process to cleanup qtu_tag_data{} left around from
processes that didn't use resource tracking via /dev/xt_qtaguid.
* Debug printing tweaks
Better code inclusion/exclusion handling,
and extra debug out of full state.
Change-Id: I735965af2962ffcd7f3021cdc0068b3ab21245c2
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
Make the warning less scary.
Change-Id: I0276c5413e37ec991f24db57aeb90333fb1b5a65
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
There is a
/proc/net/xt_qtaguid/iface/<iface>/{rx_bytes,rx_packets,tx_bytes,...}
but for better convenience and to avoid getting overly stale net/dev stats
we now have
/proc/net/xt_qtaguid/iface_stat_all
which outputs lines of:
iface_name active rx_bytes rx_packets tx_bytes tx_packets
net_dev_rx_bytes net_dev_rx_packets net_dev_tx_bytes net_dev_tx_packets
Change-Id: I12cc10d2d123b86b56d4eb489b1d77b2ce72ebcf
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
Most net devs will not reset their stats when just going down/up,
unless a NETDEV_UNREGISTER was notified.
But some devs will not send out a NETDEV_UNREGISTER but still
reset their stats just before a NETDEV_UP.
Now we just track the dev stats during NETDEV_DOWN... just in case.
Then on NETDEV_UP we check the stats: if the device didn't do a
NETDEV_UNREGISTER and a prior NETDEV_DOWN captured stats, then we treat
it as an UNREGISTER and save the totals from the stashed values.
Added extra netdev event debugging.
Change-Id: Iec79e74bfd40269aa3e5892f161be71e09de6946
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
When a process doesn't have /dev/xt_qtaguid open, only warn once
instead of for every ctrl access.
Change-Id: I98a462a8731254ddc3bf6d2fefeef9823659b1f0
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
* Added global resource tracking based on tags.
- Can be put into passive mode via
/sys/modules/xt_qtaguid/params/tag_tracking_passive
- The number of socket tags per UID is now limited
- Adding /dev/xt_qtaguid that each process should open before starting
to tag sockets. A later change will make it a "must".
- A process should not create new tags unless it has the dev open.
A later change will make it a must.
- On qtaguid_resources release, the process' matching socket tag info
is deleted.
* Support run-time debug mask via /sys/modules parameter "debug_mask".
* split module into prettyprinting code, includes, main.
* Removed ptrdiff_t usage which didn't work in all cases.
Change-Id: I4a21d3bea55d23c1c3747253904e2a79f7d555d9
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
"cat /proc/net/xt_qtaguid/stats"
for a non-priviledged UID would output multiple twice its own stats.
The fix tweaks the way lines are counted.
Non-root:
idx iface acct_tag_hex uid_tag_int cnt_set ...
2 wlan0 0x0 10022 0 ...
3 wlan0 0x0 10022 1 ...
4 wlan0 0x3010000000000000 10022 0 ...
5 wlan0 0x3010000000000000 10022 1 ...
Root:
idx iface acct_tag_hex uid_tag_int cnt_set
2 wlan0 0x0 0 0 ...
3 wlan0 0x0 0 1 ...
4 wlan0 0x0 1000 0 ...
...
12 wlan0 0x0 10022 0 ...
13 wlan0 0x0 10022 1 ...
...
18 wlan0 0x3010000000000000 10022 0 ...
19 wlan0 0x3010000000000000 10022 1 ...
Change-Id: I3cae1f4fee616bc897831350374656b0c718c45b
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
Turns out that some devices don't call the notifier chains
with NETDEV_UNREGISTER.
So now we only track up/down as the points for tracking
active/inactive transitions and saving the get_dev_stats().
Change-Id: I948755962b4c64150b4d04f294fb4889f151e42b
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
/proc/net/xt_qtaguid/ctrl will now show:
active tagged sockets: lines of "sock=%p tag=0x%llx (uid=%u)"
sockets_tagged, : the number of sockets successfully tagged.
sockets_untagged: the number of sockets successfully untagged.
counter_set_changes: ctrl counter set change requests.
delete_cmds: ctrl delete commands completed.
iface_events: number of NETDEV_* events handled.
match_found_sk: sk found in skbuff without ct assist.
match_found_sk_in_ct: the number of times the connection tracker found
a socket for us. This happens when the skbuff didn't have info.
match_found_sk_none: the number of times no sk could be determined
successfully looked up. This indicates we don't know who the
data actually belongs to. This could be unsolicited traffic.
Change-Id: I3a65613bb24852e1eea768ab0320a6a7073ab9be
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
sockfd_put() risks sleeping.
So when doing a delete ctrl command, defer the sockfd_put() and
kfree() to outside of the spinlock.
Change-Id: I5f8ab51d05888d885b2fbb035f61efa5b7abb88a
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
* Don't hold the sockets after tagging.
sockfd_lookup() does a get() on the associated file.
There was no matching put() so a closed socket could never be
freed.
* Don't rely on struct member order for tag_node
The structs that had a struct tag_node member would work with
the *_tree_* routines only because tag_node was 1st.
* Improve debug messages
Provide info on who the caller is. Use unsigned int for uid.
* Only process NETDEV_UP events.
* Pacifier: disable netfilter matching. Leave .../stats header.
Change-Id: Iccb8ae3cca9608210c417597287a2391010dff2c
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
* Allow tracking interfaces that only have an ipv6 address.
Deal with ipv6 notifier chains that do NETDEV_UP without the rtnl_lock()
* Allow root all access to procfs ctrl/stats.
To disable all checks:
echo 0 > /sys/module/xt_qtaguid/parameters/ctrl_write_gid
echo 0 > /sys/module/xt_qtaguid/parameters/stats_readall_gid
* Add CDEBUG define to enable pr_debug output specific to
procfs ctrl/stats access.
Change-Id: I9a469511d92fe42734daff6ea2326701312a161b
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
* Added support for sets of counters.
By default set 0 is active.
Userspace can control which set is active for a given UID by
writing to .../ctrl
s <set_num> <uid>
Changing the active set is only permitted for processes in the
AID_NET_BW_ACCT group.
The active set tracking is reset when the uid tag is deleted with
the .../ctrl command
d 0 <uid>
* New output format for the proc .../stats
- Now has cnt_set in the list.
"""
idx iface acct_tag_hex uid_tag_int cnt_set rx_bytes rx_packets tx_bytes tx_packets rx_tcp_packets rx_tcp_bytes rx_udp_packets rx_udp_bytes rx_other_packets rx_other_bytes tx_tcp_packets tx_tcp_bytes tx_udp_packets tx_udp_bytes tx_other_packets tx_other_bytes
...
2 rmnet0 0x0 1000 0 27729 29 1477 27 27501 26 228 3 0 0 1249 24 228 3 0 0
2 rmnet0 0x0 1000 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 rmnet0 0x0 10005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 rmnet0 0x0 10005 1 46407 57 8008 64 46407 57 0 0 0 0 8008 64 0 0 0 0
...
6 rmnet0 0x7fff000100000000 10005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 rmnet0 0x7fff000100000000 10005 1 27493 24 1564 22 27493 24 0 0 0 0 1564 22 0 0 0 0
"""
* Refactored for proc stats output code.
* Silenced some of the per packet debug output.
* Reworded some of the debug messages.
* Replaced all the spin_lock_irqsave/irqrestore with *_bh():
netfilter handling is done in softirq.
Change-Id: Ibe89f9d754579fd97335617186c614b43333cfd3
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
This would cause log spam to the point of slowing down the system.
Change-Id: I5655f0207935004b0198f43ad0d3c9ea25466e4e
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
* uid handling
- Limit UID impersonation to processes with a gid in AID_NET_BW_ACCT.
This affects socket tagging, and data removal.
- Limit stats lookup to own uid or the process gid is in AID_NET_BW_STATS.
This affects stats lookup.
* allow pacifying the module
Setting passive to Y/y will make the module return immediately on
external stimulus.
No more stats and silent success on ctrl writes.
Mainly used when one suspects this module of misbehaving.
Change-Id: I83990862d52a9b0922aca103a0f61375cddeb7c4
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
* Add a new ctrl command to delete stored data.
d <acct_tag> [<uid>]
The uid will default to the running process's.
The accounting tag can be 0, in which case all counters and socket tags
associated with the uid will be cleared.
* Simplify the ctrl command handling at the expense of duplicate code.
This should make it easier to maintain.
* /proc/net/xt_qtaguid/stats now returns more stats
idx iface acct_tag_hex uid_tag_int
{rx,tx}_{bytes,packets}
{rx,tx}_{tcp,udp,other}_{bytes,packets}
the {rx,tx}_{bytes,packets} are the totals.
* re-tagging will now allow changing the uid.
Change-Id: I9594621543cefeab557caa3d68a22a3eb320466d
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
This uses the NETLINK NETLINK_NFLOG family to log a single message
when the quota limit is reached.
It uses the same packet type as ipt_ULOG, but
- never copies skb data,
- uses 112 as the event number (ULOG's +1)
It doesn't log if the module param "event_num" is 0.
Change-Id: I6f31736b568bb31a4ff0b9ac2ee58380e6b675ca
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
The xt_quota2 came from
http://sourceforge.net/projects/xtables-addons/develop
It needed tweaking for it to compile within the kernel tree.
Fixed kmalloc() and create_proc_entry() invocations within
a non-interruptible context.
Removed useless copying of current quota back to the iptable's
struct matchinfo:
- those are per CPU: they will change randomly based on which
cpu gets to update the value.
- they prevent matching a rule: e.g.
-A chain -m quota2 --name q1 --quota 123
can't be followed by
-D chain -m quota2 --name q1 --quota 123
as the 123 will be compared to the struct matchinfo's quota member.
Change-Id: I021d3b743db3b22158cc49acb5c94d905b501492
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
The original xt_quota in the kernel is plain broken:
- counts quota at a per CPU level
(was written back when ubiquitous SMP was just a dream)
- provides no way to count across IPV4/IPV6.
This patch is the original unaltered code from:
http://sourceforge.net/projects/xtables-addons
at commit e84391ce665cef046967f796dd91026851d6bbf3
Change-Id: I19d49858840effee9ecf6cff03c23b45a97efdeb
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
This time the symptom is caused by tagging the same socket twice
without untagging it in between.
This would cause it to not unlock, and return.
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
When processing args passed to the procfs ctrl, if the tag was
invalid it would exit without releasing the spin_lock...
Bye bye scheduling.
Signed-off-by: JP Abgrall <jpa@google.com>
Change-Id: Ic1480ae9d37bba687586094cf6d0274db9c5b28a
|
|
(This is a direct cherry-pick from 2.6.39: I3b925802)
Fixed procreader for /proc/net/xt_qtaguid/ctrl: it would just
fill the output with the same entry.
Simplify the **start handling.
Signed-off-by: JP Abgrall <jpa@google.com>
Change-Id: I3b92580228f2b57795bb2d0d6197fc95ab6be552
|
|
(This is a direct cherry pick from 2.6.39: Id2a9912b)
* xt_socket_get_sk() returns invalid sockets when the sk_state is TCP_TIME_WAIT.
Added detection of time-wait.
* Added more constrained usage: qtaguid insures that xt_socket_get*_sk() is
not invoked for unexpected hooks or protocols (but I have not seen those
active at the point where the returned sk is bad).
Signed-off-by: JP Abgrall <jpa@google.com>
Change-Id: Id2a9912bb451a3e59d012fc55bbbd40fbb90693f
|
|
This module allows tracking stats at the socket level for given UIDs.
It replaces xt_owner.
If the --uid-owner is not specified, it will just count stats based on
who the skb belongs to. This will even happen on incoming skbs as it
looks into the skb via xt_socket magic to see who owns it.
If an skb is lost, it will be assigned to uid=0.
To control what sockets of what UIDs are tagged by what, one uses:
echo t $sock_fd $accounting_tag $the_billed_uid \
> /proc/net/xt_qtaguid/ctrl
So whenever an skb belongs to a sock_fd, it will be accounted against
$the_billed_uid
and matching stats will show up under the uid with the given
$accounting_tag.
Because the number of allocations for the stats structs is not that big:
~500 apps * 32 per app
we'll just do it atomic. This avoids walking lists many times, and
the fancy worker thread handling. Slabs will grow when needed later.
It use netdevice and inetaddr notifications instead of hooks in the core dev
code to track when a device comes and goes. This removes the need for
exposed iface_stat.h.
Put procfs dirs in /proc/net/xt_qtaguid/
ctrl
stats
iface_stat/<iface>/...
The uid stats are obtainable in ./stats.
Change-Id: I01af4fd91c8de651668d3decb76d9bdc1e343919
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
The socket matching function has some nifty logic to get the struct sock
from the skb or from the connection tracker.
We export this so other xt_* can use it, similarly to ho how
xt_socket uses nf_tproxy_get_sock.
Change-Id: I11c58f59087e7f7ae09e4abd4b937cd3370fa2fd
Signed-off-by: JP Abgrall <jpa@google.com>
|
|
|
|
ip_vs_mutext is used by both netns shutdown code and startup
and both implicit uses sk_lock-AF_INET mutex.
cleanup CPU-1 startup CPU-2
ip_vs_dst_event() ip_vs_genl_set_cmd()
sk_lock-AF_INET __ip_vs_mutex
sk_lock-AF_INET
__ip_vs_mutex
* DEAD LOCK *
A new mutex placed in ip_vs netns struct called sync_mutex is added.
Comments from Julian and Simon added.
This patch has been running for more than 3 month now and it seems to work.
Ver. 3
IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex
instead of __ip_vs_mutex as sugested by Julian.
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Replace the open coded initialization with the init function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
GRE connections cause ctnetlink event flood because the ASSURED event
is set for every packet received.
Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
|
|
tcp_sack skips fastpath
The wrong multiplication of TCPOLEN_TSTAMP_ALIGNED by 4 skips the fast path
for the timestamp-only option. Bug reported by Michael M. Builov (netfilter
bugzilla #738).
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Michael M. Builov reported that in the tcp_options and tcp_sack functions
of netfilter TCP conntrack the incorrect handling of invalid TCP option
with too big opsize may lead to read access beyond tcp-packet or buffer
allocated on stack (netfilter bugzilla #738). The fix is to stop parsing
the options at detecting the broken option.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
When both the server and the client are NATed, the set-link-info control
packet containing the peer's call-id field is not properly translated.
I have verified that it was working in 2.6.16.13 kernel previously but
due to rewrite, this scenario stopped working (Not knowing exact version
when it stopped working).
Signed-off-by: Sanket Shah <sanket.shah@elitecore.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
A userspace listener may send (bogus) NF_STOLEN verdict, which causes skb leak.
This problem was previously fixed via
64507fdbc29c3a622180378210ecea8659b14e40 (netfilter:
nf_queue: fix NF_STOLEN skb leak) but this had to be reverted because
NF_STOLEN can also be returned by a netfilter hook when iterating the
rules in nf_reinject.
Reject userspace NF_STOLEN verdict, as suggested by Michal Miroslaw.
This is complementary to commit fad54440438a7c231a6ae347738423cbabc936d9
(netfilter: avoid double free in nf_reinject).
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
NF_STOLEN means skb was already freed
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 4a5a5c73b7cfee (slightly better error reporting) added some
useless code in xt_rateest_mt_checkentry().
Fix this so that different error codes can really be returned.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-2.6
|
|
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Fix wrong check in list_splice_init_rcu()
net,rcu: Convert call_rcu(xt_rateest_free_rcu) to kfree_rcu()
sysctl,rcu: Convert call_rcu(free_head) to kfree
vmalloc,rcu: Convert call_rcu(rcu_free_vb) to kfree_rcu()
vmalloc,rcu: Convert call_rcu(rcu_free_va) to kfree_rcu()
ipc,rcu: Convert call_rcu(ipc_immediate_free) to kfree_rcu()
ipc,rcu: Convert call_rcu(free_un) to kfree_rcu()
security,rcu: Convert call_rcu(sel_netport_free) to kfree_rcu()
security,rcu: Convert call_rcu(sel_netnode_free) to kfree_rcu()
ia64,rcu: Convert call_rcu(sn_irq_info_free) to kfree_rcu()
block,rcu: Convert call_rcu(disk_free_ptbl_rcu_cb) to kfree_rcu()
scsi,rcu: Convert call_rcu(fc_rport_free_rcu) to kfree_rcu()
audit_tree,rcu: Convert call_rcu(__put_tree) to kfree_rcu()
security,rcu: Convert call_rcu(whitelist_item_free) to kfree_rcu()
md,rcu: Convert call_rcu(free_conf) to kfree_rcu()
|
|
This resolves a panic on module removal.
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
|
|
different interfaces
If overlapping networks with different interfaces was added to
the set, the type did not handle it properly. Example
ipset create test hash:net,iface
ipset add test 192.168.0.0/16,eth0
ipset add test 192.168.0.0/24,eth1
Now, if a packet was sent from 192.168.0.0/24,eth0, the type returned
a match.
In the patch the algorithm is fixed in order to correctly handle
overlapping networks.
Limitation: the same network cannot be stored with more than 64 different
interfaces in a single set.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
The RCU callback xt_rateest_free_rcu() just calls kfree(), so we can
use kfree_rcu() instead of call_rcu(). This also allows us to dispense
with an rcu_barrier() call, speeding up unloading of this module.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Introduces a new nfnetlink type that applies a given
verdict to all queued packets with an id <= the id in the verdict
message.
If a mark is provided it is applied to all matched packets.
This reduces the number of verdicts that have to be sent.
Applications that make use of this feature need to maintain
a timeout to send a batchverdict periodically to avoid starvation.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Packet identifier is currently setup in nfqnl_build_packet_message(),
using one atomic_inc_return().
Problem is that since several cpus might concurrently call
nfqnl_enqueue_packet() for the same queue, we can deliver packets to
consumer in non monotonic way (packet N+1 being delivered after packet
N)
This patch moves the packet id setup from nfqnl_build_packet_message()
to nfqnl_enqueue_packet() to guarantee correct delivery order.
This also removes one atomic operation.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
nenetlink_queue operations on SMP are not efficent if several queues are
used, because of nfnl_mutex contention when applications give packet
verdict.
Use new call_rcu field in struct nfnl_callback to advertize a callback
that is called under rcu_read_lock instead of nfnl_mutex.
On my 2x4x2 machine, I was able to reach 2.000.000 pps going through
user land returning NF_ACCEPT verdicts without losses, instead of less
than 500.000 pps before patch.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Goal of this patch is to permit nfnetlink providers not mandate
nfnl_mutex being held while nfnetlink_rcv_msg() calls them.
If struct nfnl_callback contains a non NULL call_rcu(), then
nfnetlink_rcv_msg() will use it instead of call() field, holding
rcu_read_lock instead of nfnl_mutex
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|