Age | Commit message (Collapse) | Author |
|
Adrian Bunk:
Backported to 2.6.16.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The move of qdisc destruction to a rcu callback broke locking in the
entire qdisc layer by invalidating previously valid assumptions about
the context in which changes to the qdisc tree occur.
The two assumptions were:
- since changes only happen in process context, read_lock doesn't need
bottem half protection. Now invalid since destruction of inner qdiscs,
classifiers, actions and estimators happens in the RCU callback unless
they're manually deleted, resulting in dead-locks when read_lock in
process context is interrupted by write_lock_bh in bottem half context.
- since changes only happen under the RTNL, no additional locking is
necessary for data not used during packet processing (f.e. u32_list).
Again, since destruction now happens in the RCU callback, this assumption
is not valid anymore, causing races while using this data, which can
result in corruption or use-after-free.
Instead of "fixing" this by disabling bottem halfs everywhere and adding
new locks/refcounting, this patch makes these assumptions valid again by
moving destruction back to process context. Since only the dev->qdisc
pointer is protected by RCU, but ->enqueue and the qdisc tree are still
protected by dev->qdisc_lock, destruction of the tree can be performed
immediately and only the final free needs to happen in the rcu callback
to make sure dev_queue_xmit doesn't access already freed memory.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The ->set_mac_address handlers expect a pointer to a
sockaddr which contains the MAC address, whereas
IFLA_ADDRESS provides just the MAC address itself.
So whip up a sockaddr to wrap around the netlink
attribute for the ->set_mac_address call.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Replace add_timer() by mod_timer() in dst_run_gc
in order to avoid BUG message.
CPU1 CPU2
dst_run_gc() entered dst_run_gc() entered
spin_lock(&dst_lock) .....
del_timer(&dst_gc_timer) fail to get lock
.... mod_timer() <--- puts
timer back
to the list
add_timer(&dst_gc_timer) <--- BUG because timer is in list already.
Found during OpenVZ internal testing.
At first we thought that it is OpenVZ specific as we
added dst_run_gc(0) call in dst_dev_event(),
but as Alexey pointed to me it is possible to trigger
this condition in mainstream kernel.
F.e. timer has fired on CPU2, but the handler was preeempted
by an irq before dst_lock is tried.
Meanwhile, someone on CPU1 adds an entry to gc list and
starts the timer.
If CPU2 was preempted long enough, this timer can expire
simultaneously with resuming timer handler on CPU1, arriving
exactly to the situation described.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
When pskb_trim has to defer to ___pksb_trim to trim the frag_list part of
the packet, the frag_list is not updated to reflect the trimming. This
will usually work fine until you hit something that uses the packet length
or tail from the frag_list.
Examples include esp_output and ip_fragment.
Another problem caused by this is that you can end up with a linear packet
with a frag_list attached.
It is possible to get away with this if we audit everything to make sure
that they always consult skb->len before going down onto frag_list. In
fact we can do the samething for the paged part as well to avoid copying
the data area of the skb. For now though, let's do the conservative fix
and update frag_list.
Many thanks to Marco Berizzi for helping me to track down this bug.
This 4-year old bug took 3 months to track down. Marco was very patient
indeed :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
We have seen a couple of __alloc_pages() failures due to
fragmentation, there is plenty of free memory but no large order pages
available. I think the problem is in sock_alloc_send_pskb(), the
gfp_mask includes __GFP_REPEAT but its never used/passed to the page
allocator. Shouldnt the gfp_mask be passed to alloc_skb() ?
Signed-off-by: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Since pskb_copy tacks on the non-linear bits from the original
skb, it needs to count them in the truesize field of the new skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This bug was unknowingly fixed the GSO patches (or rather, its effect was
unknown at the time).
Thanks to Marco Berizzi's persistence which is documented in the thread
"ipsec tunnel asymmetrical mtu", we now know that it can have highly
non-obvious symptoms.
What happens is that uninitialised uso_size fields can cause packets to
be incorrectly identified as UFO, which means that it does not get
fragmented even if it's over the MTU.
The fix is simple enough.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Mirror the bug fix from fill_packet_ipv4()
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Signed-off-by: Chen-Li Tien <cltien@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The function pointers which were checked were for their get_* counterparts.
Typically a copy-paste typo.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Acked-by: Jeff Garzik <jeff@garzik.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The function ethtool_get_ufo was referring to ETHTOOL_GTSO instead of
ETHTOOL_GUFO.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
On Sun, 9 Apr 2006 21:56:59 +0400,
Sergey Vlasov <vsu@altlinux.ru> wrote:
> However, show_address() does not output anything unless
> dev->reg_state == NETREG_REGISTERED - and this state is set by
> netdev_run_todo() only after netdev_register_sysfs() returns, so in
> the meantime (while netdev_register_sysfs() is busy adding the
> "statistics" attribute group) some process may see an empty "address"
> attribute.
I've tried the attached patch, suggested by Sergey Vlasov on
hotplug-devel@, and as far as i can test it works just fine.
Signed-off-by: Alexander Patrakov <patrakov@ums.usu.ru>
Signed-off-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
The user can pass us arbitrary garbage so we should ensure the
string they give us is null terminated before we pass it on
to dev_get_by_index() et al.
Found by Solar Designer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
In 295f7324ff8d9ea58b4d3ec93b1aaa1d80e048a9 I moved defer_accept from
tcp_sock to request_queue and mistakingly reset it at reqsl_queue_alloc, causing
calls to setsockopt(TCP_DEFER_ACCEPT ) to be lost after bind, the fix is to
remove the zeroing of rskq_defer_accept from reqsl_queue_alloc.
Thanks to Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru> for
reporting and testing the suggested fix.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some of netfilter-related members are initalized / copied twice in
skb_clone(). Remove one.
Pointed out by Olivier MATZ <olivier.matz@6wind.com>.
And this patch also fixes order of copying / clearing members.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Revert the following changeset:
bc8dfcb93970ad7139c976356bfc99d7e251deaf
Recursive SKB frag lists are really possible and disallowing
them breaks things.
Noticed by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a netlink message is not related to a netlink socket,
it is issued by kernel socket with pid 0. Netlink "pid" has nothing
to do with current->pid. I called it incorrectly, if it was named "port",
the confusion would be avoided.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.
As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().
(The above only applies to users of asm-generic/percpu.h. powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This information is already available via /proc/net/bonding/*
therefore it doesn't make sense to require CAP_NET_ADMIN
privileges.
Original patch by Laurent Deniel <laurent.deniel@free.fr>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On the error path if we allocated an fclone then we will free it in
the wrong pool.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes some whitespace issues in net/core/filter.c
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Don't assume 16.
Found by Ben Greear.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This replaces a memcmp() with is_zero_ether_addr().
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This replaces some tests with is_zero_ether_addr(), memcmp(one, two,
6) with compare_ether_addr(one, two), and 6 with ETH_ALEN where
appropriate.
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes some whitespace issues in net/core/filter.c
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This removes redundant comments, and moves one comment to a better
location.
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net: Use <linux/capability.h> where capable() is used.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This removes more unneeded casts on the return value for kmalloc(),
sock_kmalloc(), and vmalloc().
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It needs <linux/etherdevice.h> for compare_ether_addr()
|
|
This changes some memcmp(one,two,ETH_ALEN) to compare_ether_addr(one,two).
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev->get_wireless_stats is deprecated but removing it also removes wireless
subdirectory in sysfs. This patch puts it back.
akpm: I don't know what's happening here. This might be appropriate as a
2.6.15.x compatibility backport. Waiting to hear from Jeff.
Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This changes some simple "if (x) BUG();" statements to "BUG_ON(x);"
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
pktgen_find_thread() and pktgen_create_thread() are only called at
initialization time.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It should return an unsigned value, and fix sk_filter() as well.
Signed-off-by: Kris Katterjohn <kjak@ispwest.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial manual merge fixup for usb_find_interface clashes.
|
|
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
|
|
Recent udev versions don't longer cover bad sysfs timing with built-in
logic. Explicit rules are required to do that. For net devices, the
following is needed:
ACTION=="add", SUBSYSTEM=="net", WAIT_FOR_SYSFS="address"
to handle access to net device properties from an event handler without
races.
This patch changes the main net attributes to be created by the driver
core, which is done _before_ the event is sent out and will not require
the stat() loop of the WAIT_FOR_SYSFS key.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently all network protocols need to call dev_ioctl as the default
fallback in their ioctl implementations. This patch adds a fallback
to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD.
This way all the procotol ioctl handlers can be simplified and we don't
need to export dev_ioctl.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
From: Benjamin LaHaise <bcrl@kvack.org>
In __alloc_skb(), the use of skb_shinfo() which casts a u8 * to the
shared info structure results in gcc being forced to do a reload of the
pointer since it has no information on possible aliasing. Fix this by
using a pointer to refer to skb_shared_info.
By initializing skb_shared_info sequentially, the write combining buffers
can reduce the number of memory transactions to a single write. Reorder
the initialization in __alloc_skb() to match the structure definition.
There is also an alignment issue on 64 bit systems with skb_shared_info
by converting nr_frags to a short everything packs up nicely.
Also, pass the slab cache pointer according to the fclone flag instead
of using two almost identical function calls.
This raises bw_unix performance up to a peak of 707KB/s when combined
with the spinlock patch. It should help other networking protocols, too.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.
Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
static variables should not be explicitly initialised to 0. This causes
them to be placed in .data instead of .bss. This patch de-initialises 3
static variables in net/core/pktgen.c.
There are approximately 800 more such variables in the source tree
(2.6.15rc5). If there is more interrest I'd be willing to track down the
rest of these as well and de-initialise them as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.
Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It also looks like there were 2 places where the test on sk_err was
missing from the event wait logic (in sk_stream_wait_connect and
sk_stream_wait_memory), while the rest of the sock_error() users look
to be doing the right thing. This version of the patch fixes those,
and cleans up a few places that were testing ->sk_err directly.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
it is left on the socket receive queue. This means that when we detect
a checksum error we have to be careful when trying to free the packet
as someone could have dequeued it in the time being.
Currently this delicate logic is duplicated three times between UDPv4,
UDPv6 and RAWv6. This patch moves them into a one place and simplifies
the code somewhat.
This is based on a suggestion by Eric Dumazet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets. Extensions to the SELinux LSM are
included that leverage the patch for this purpose.
This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.
Patch purpose:
The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association. Such access
controls augment the existing ones based on network interface and IP
address. The former are very coarse-grained, and the latter can be
spoofed. By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.
Patch design approach:
The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.
A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.
Patch implementation details:
On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools). This is enforced in xfrm_state_find.
On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.
The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.
Also, if IPSec is used without security contexts, the impact is
minimal. The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.
Testing:
The pfkey interface is tested using the ipsec-tools. ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.
The xfrm_user interface is tested via ad hoc programs that set
security contexts. These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface. Testing of sa functions was done by tracing kernel
behavior.
Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|