summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-01-22net: fix IP early demux racesEric Dumazet
[ Upstream commit 5037e9ef9454917b047f9f3a19b4dd179fbf7cd4 ] David Wilder reported crashes caused by dst reuse. <quote David> I am seeing a crash on a distro V4.2.3 kernel caused by a double release of a dst_entry. In ipv4_dst_destroy() the call to list_empty() finds a poisoned next pointer, indicating the dst_entry has already been removed from the list and freed. The crash occurs 18 to 24 hours into a run of a network stress exerciser. </quote> Thanks to his detailed report and analysis, we were able to understand the core issue. IP early demux can associate a dst to skb, after a lookup in TCP/UDP sockets. When socket cache is not properly set, we want to store into sk->sk_dst_cache the dst for future IP early demux lookups, by acquiring a stable refcount on the dst. Problem is this acquisition is simply using an atomic_inc(), which works well, unless the dst was queued for destruction from dst_release() noticing dst refcount went to zero, if DST_NOCACHE was set on dst. We need to make sure current refcount is not zero before incrementing it, or risk double free as David reported. This patch, being a stable candidate, adds two new helpers, and use them only from IP early demux problematic paths. It might be possible to merge in net-next skb_dst_force() and skb_dst_force_safe(), but I prefer having the smallest patch for stable kernels : Maybe some skb_dst_force() callers do not expect skb->dst can suddenly be cleared. Can probably be backported back to linux-3.6 kernels Reported-by: David J. Wilder <dwilder@us.ibm.com> Tested-by: David J. Wilder <dwilder@us.ibm.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-22net: add validation for the socket syscall protocol argumentHannes Frederic Sowa
[ Upstream commit 79462ad02e861803b3840cc782248c7359451cd9 ] 郭永刚 reported that one could simply crash the kernel as root by using a simple program: int socket_fd; struct sockaddr_in addr; addr.sin_port = 0; addr.sin_addr.s_addr = INADDR_ANY; addr.sin_family = 10; socket_fd = socket(10,3,0x40000000); connect(socket_fd , &addr,16); AF_INET, AF_INET6 sockets actually only support 8-bit protocol identifiers. inet_sock's skc_protocol field thus is sized accordingly, thus larger protocol identifiers simply cut off the higher bits and store a zero in the protocol fields. This could lead to e.g. NULL function pointer because as a result of the cut off inet_num is zero and we call down to inet_autobind, which is NULL for raw sockets. kernel: Call Trace: kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70 kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80 kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110 kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80 kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200 kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10 kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89 I found no particular commit which introduced this problem. CVE: CVE-2015-8543 Cc: Cong Wang <cwang@twopensource.com> Reported-by: 郭永刚 <guoyonggang@360.cn> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-22sctp: update the netstamp_needed counter when copying socketsMarcelo Ricardo Leitner
[ Upstream commit 01ce63c90170283a9855d1db4fe81934dddce648 ] Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy related to disabling sock timestamp. When SCTP accepts an association or peel one off, it copies sock flags but forgot to call net_enable_timestamp() if a packet timestamping flag was copied, leading to extra calls to net_disable_timestamp() whenever such clones were closed. The fix is to call net_enable_timestamp() whenever we copy a sock with that flag on, like tcp does. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-22vxlan: fix incorrect RCO bit in VXLAN headerJiri Benc
[ Upstream commit c5fb8caaf91ea6a92920cf24db10cfc94d58de0f ] Commit 3511494ce2f3d ("vxlan: Group Policy extension") changed definition of VXLAN_HF_RCO from 0x00200000 to BIT(24). This is obviously incorrect. It's also in violation with the RFC draft. Fixes: 3511494ce2f3d ("vxlan: Group Policy extension") Cc: Thomas Graf <tgraf@suug.ch> Cc: Tom Herbert <therbert@google.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-22USB: add quirk for devices with broken LPMAlan Stern
commit ad87e03213b552a5c33d5e1e7a19a73768397010 upstream. Some USB device / host controller combinations seem to have problems with Link Power Management. For example, Steinar found that his xHCI controller wouldn't handle bandwidth calculations correctly for two video cards simultaneously when LPM was enabled, even though the bus had plenty of bandwidth available. This patch introduces a new quirk flag for devices that should remain disabled for LPM, and creates quirk entries for Steinar's devices. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-14ext4, jbd2: ensure entering into panic after recording an error in superblockDaeho Jeong
commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 upstream. If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the journaling will be aborted first and the error number will be recorded into JBD2 superblock and, finally, the system will enter into the panic state in "errors=panic" option. But, in the rare case, this sequence is little twisted like the below figure and it will happen that the system enters into panic state, which means the system reset in mobile environment, before completion of recording an error in the journal superblock. In this case, e2fsck cannot recognize that the filesystem failure occurred in the previous run and the corruption wouldn't be fixed. Task A Task B ext4_handle_error() -> jbd2_journal_abort() -> __journal_abort_soft() -> __jbd2_journal_abort_hard() | -> journal->j_flags |= JBD2_ABORT; | | __ext4_abort() | -> jbd2_journal_abort() | | -> __journal_abort_soft() | | -> if (journal->j_flags & JBD2_ABORT) | | return; | -> panic() | -> jbd2_journal_update_sb_errno() Tested-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-14net_sched: fix qdisc_tree_decrease_qlen() racesEric Dumazet
[ Upstream commit 4eaf3b84f2881c9c028f1d5e76c52ab575fe3a66 ] qdisc_tree_decrease_qlen() suffers from two problems on multiqueue devices. One problem is that it updates sch->q.qlen and sch->qstats.drops on the mq/mqprio root qdisc, while it should not : Daniele reported underflows errors : [ 681.774821] PAX: sch->q.qlen: 0 n: 1 [ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head; [ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.201511282239-1-grsec #1 [ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015 [ 681.774956] ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c [ 681.774959] ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b [ 681.774960] ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001 [ 681.774962] Call Trace: [ 681.774967] [<ffffffffa95d2810>] dump_stack+0x4c/0x7f [ 681.774970] [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50 [ 681.774972] [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160 [ 681.774976] [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel] [ 681.774978] [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel] [ 681.774980] [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0 [ 681.774983] [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160 [ 681.774985] [<ffffffffa90664c1>] __do_softirq+0xf1/0x200 [ 681.774987] [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30 [ 681.774989] [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260 [ 681.774991] [<ffffffffa9089560>] ? sort_range+0x40/0x40 [ 681.774992] [<ffffffffa9085fe4>] kthread+0xe4/0x100 [ 681.774994] [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170 [ 681.774995] [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70 mq/mqprio have their own ways to report qlen/drops by folding stats on all their queues, with appropriate locking. A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup() without proper locking : concurrent qdisc updates could corrupt the list that qdisc_match_from_root() parses to find a qdisc given its handle. Fix first problem adding a TCQ_F_NOPARENT qdisc flag that qdisc_tree_decrease_qlen() can use to abort its tree traversal, as soon as it meets a mq/mqprio qdisc children. Second problem can be fixed by RCU protection. Qdisc are already freed after RCU grace period, so qdisc_list_add() and qdisc_list_del() simply have to use appropriate rcu list variants. A future patch will add a per struct netdev_queue list anchor, so that qdisc_tree_decrease_qlen() can have more efficient lookups. Reported-by: Daniele Fucini <dfucini@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-14ipv6: add complete rcu protection around np->optEric Dumazet
[ Upstream commit 45f6fad84cc305103b28d73482b344d7f5b76f39 ] This patch addresses multiple problems : UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions while socket is not locked : Other threads can change np->opt concurrently. Dmitry posted a syzkaller (http://github.com/google/syzkaller) program desmonstrating use-after-free. Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock() and dccp_v6_request_recv_sock() also need to use RCU protection to dereference np->opt once (before calling ipv6_dup_options()) This patch adds full RCU protection to np->opt Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-14ipv6: distinguish frag queues by device for multicast and link-local packetsMichal Kubeček
[ Upstream commit 264640fc2c5f4f913db5c73fa3eb1ead2c45e9d7 ] If a fragmented multicast packet is received on an ethernet device which has an active macvlan on top of it, each fragment is duplicated and received both on the underlying device and the macvlan. If some fragments for macvlan are processed before the whole packet for the underlying device is reassembled, the "overlapping fragments" test in ip6_frag_queue() discards the whole fragment queue. To resolve this, add device ifindex to the search key and require it to match reassembling multicast packets and packets to link-local addresses. Note: similar patch has been already submitted by Yoshifuji Hideaki in http://patchwork.ozlabs.org/patch/220979/ but got lost and forgotten for some reason. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-14ip_tunnel: disable preemption when updating per-cpu tstatsJason A. Donenfeld
[ Upstream commit b4fe85f9c9146f60457e9512fb6055e69e6a7a65 ] Drivers like vxlan use the recently introduced udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the packet, updates the struct stats using the usual u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats). udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch tstats, so drivers like vxlan, immediately after, call iptunnel_xmit_stats, which does the same thing - calls u64_stats_update_begin/end on this_cpu_ptr(dev->tstats). While vxlan is probably fine (I don't know?), calling a similar function from, say, an unbound workqueue, on a fully preemptable kernel causes real issues: [ 188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6 [ 188.435579] caller is debug_smp_processor_id+0x17/0x20 [ 188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2 [ 188.435607] Call Trace: [ 188.435611] [<ffffffff8234e936>] dump_stack+0x4f/0x7b [ 188.435615] [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0 [ 188.435619] [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20 The solution would be to protect the whole this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with disabling preemption and then reenabling it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-14unix: avoid use-after-free in ep_remove_wait_queueRainer Weikusat
[ Upstream commit 7d267278a9ece963d77eefec61630223fce08c6c ] Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") Reviewed-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09tty: audit: Fix audit sourcePeter Hurley
commit 6b2a3d628aa752f0ab825fc6d4d07b09e274d1c1 upstream. The data to audit/record is in the 'from' buffer (ie., the input read buffer). Fixes: 72586c6061ab ("n_tty: Fix auditing support for cannonical mode") Cc: Miloslav Trmač <mitr@redhat.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09KVM: Provide function for VCPU lookup by idDavid Hildenbrand
commit db27a7a37aa0b1f8b373f8b0fb72a2ccaafb85b7 upstream. Let's provide a function to lookup a VCPU by id. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [split patch from refactoring patch] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09net: avoid NULL deref in inet_ctl_sock_destroy()Eric Dumazet
[ Upstream commit 8fa677d2706d325d71dab91bf6e6512c05214e37 ] Under low memory conditions, tcp_sk_init() and icmp_sk_init() can both iterate on all possible cpus and call inet_ctl_sock_destroy(), with eventual NULL pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09ASoC: wm8904: Correct number of EQ registersCharles Keepax
commit 97aff2c03a1e4d343266adadb52313613efb027f upstream. There are 24 EQ registers not 25, I suspect this bug came about because the registers start at EQ1 not zero. The bug is relatively harmless as the extra register written is an unused one. Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09ASoC: Add info callback for SX_TLV controlsCharles Keepax
commit 34198710f55b5f359f43e67d9a08fe5aadfbca1b upstream. SX_TLV controls are intended for situations where the register behind the control has some non-zero value indicating the minimum gain and then gains increasing from there and eventually overflowing through zero. Currently every CODEC implementing these controls specifies the minimum as the non-zero value for the minimum and the maximum as the number of gain settings available. This means when the info callback subtracts the minimum value from the maximum value to calculate the number of gain levels available it is actually under reporting the available levels. This patch fixes this issue by adding a new snd_soc_info_volsw_sx callback that does not subtract the minimum value. Fixes: 1d99f2436d0d ("ASoC: core: Rework SOC_DOUBLE_R_SX_TLV add SOC_SINGLE_SX_TLV") Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: Brian Austin <brian.austin@cirrus.com> Tested-by: Brian Austin <brian.austin@cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27locks: inline posix_lock_file_wait and flock_lock_file_waitJeff Layton
commit ee296d7c5709440f8abd36b5b65c6b3e388538d9 upstream. They just call file_inode and then the corresponding *_inode_file_wait function. Just make them static inlines instead. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: William Dauchy <william@gandi.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27locks: new helpers - flock_lock_inode_wait and posix_lock_inode_waitJeff Layton
commit 29d01b22eaa18d8b46091d3c98c6001c49f78e4a upstream. Allow callers to pass in an inode instead of a filp. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org> Tested-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: William Dauchy <william@gandi.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27sched/preempt: Fix cond_resched_lock() and cond_resched_softirq()Konstantin Khlebnikov
commit fe32d3cd5e8eb0f82e459763374aa80797023403 upstream. These functions check should_resched() before unlocking spinlock/bh-enable: preempt_count always non-zero => should_resched() always returns false. cond_resched_lock() worked iff spin_needbreak is set. This patch adds argument "preempt_offset" to should_resched(). preempt_count offset constants for that: PREEMPT_DISABLE_OFFSET - offset after preempt_disable() PREEMPT_LOCK_OFFSET - offset after spin_lock() SOFTIRQ_DISABLE_OFFSET - offset after local_bh_distable() SOFTIRQ_LOCK_OFFSET - offset after spin_lock_bh() Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Graf <agraf@suse.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: bdb438065890 ("sched: Extract the basic add/sub preempt_count modifiers") Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzz Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27sched/preempt: Rename PREEMPT_CHECK_OFFSET to PREEMPT_DISABLE_OFFSETFrederic Weisbecker
commit 90b62b5129d5cb50f62f40e684de7a1961e57197 upstream. "CHECK" suggests it's only used as a comparison mask. But now it's used further as a config-conditional preempt disabler offset. Lets disambiguate this name. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431441711-29753-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27drm/dp/mst: make mst i2c transfer code more robust.Dave Airlie
commit ae491542cbbbcca0ec8938c37d4079a985e58440 upstream. This zeroes the msg so no random stack data ends up getting sent, it also limits the function to not accepting > 4 i2c msgs. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27af_unix: Convert the unix_sk macro to an inline function for type safetyAaron Conole
[ Upstream commit 4613012db1d911f80897f9446a49de817b2c4c47 ] As suggested by Eric Dumazet this change replaces the #define with a static inline function to enjoy complaints by the compiler when misusing the API. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27net: add pfmemalloc check in sk_add_backlog()Eric Dumazet
[ Upstream commit c7c49b8fde26b74277188bdc6c9dca38db6fa35b ] Greg reported crashes hitting the following check in __sk_backlog_rcv() BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); The pfmemalloc bit is currently checked in sk_filter(). This works correctly for TCP, because sk_filter() is ran in tcp_v[46]_rcv() before hitting the prequeue or backlog checks. For UDP or other protocols, this does not work, because the sk_filter() is ran from sock_queue_rcv_skb(), which might be called _after_ backlog queuing if socket is owned by user by the time packet is processed by softirq handler. Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27skbuff: Fix skb checksum partial check.Pravin B Shelar
[ Upstream commit 31b33dfb0a144469dd805514c9e63f4993729a48 ] Earlier patch 6ae459bda tried to detect void ckecksum partial skb by comparing pull length to checksum offset. But it does not work for all cases since checksum-offset depends on updates to skb->data. Following patch fixes it by validating checksum start offset after skb-data pointer is updated. Negative value of checksum offset start means there is no need to checksum. Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull") Reported-by: Andrew Vagin <avagin@odin.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27skbuff: Fix skb checksum flag on skb pullPravin B Shelar
[ Upstream commit 6ae459bdaaeebc632b16e54dcbabb490c6931d61 ] VXLAN device can receive skb with checksum partial. But the checksum offset could be in outer header which is pulled on receive. This results in negative checksum offset for the skb. Such skb can cause the assert failure in skb_checksum_help(). Following patch fixes the bug by setting checksum-none while pulling outer header. Following is the kernel panic msg from old kernel hitting the bug. ------------[ cut here ]------------ kernel BUG at net/core/dev.c:1906! RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150 Call Trace: <IRQ> [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch] [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch] [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch] [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch] [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch] [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch] [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch] [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0 [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0 [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20 [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280 [<ffffffff81550128>] ip_local_deliver+0x88/0x90 [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370 [<ffffffff81550365>] ip_rcv+0x235/0x300 [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620 [<ffffffff8151c360>] netif_receive_skb+0x80/0x90 [<ffffffff81459935>] virtnet_poll+0x555/0x6f0 [<ffffffff8151cd04>] net_rx_action+0x134/0x290 [<ffffffff810683d8>] __do_softirq+0xa8/0x210 [<ffffffff8162fe6c>] call_softirq+0x1c/0x30 [<ffffffff810161a5>] do_softirq+0x65/0xa0 [<ffffffff810687be>] irq_exit+0x8e/0xb0 [<ffffffff81630733>] do_IRQ+0x63/0xe0 [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e Reported-by: Anupam Chanda <achanda@vmware.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22overlayfs: Make f_path always point to the overlay and f_inode to the underlayDavid Howells
commit 4bacc9c9234c7c8eec44f5ed4e960d9f96fa0f01 upstream. Make file->f_path always point to the overlay dentry so that the path in /proc/pid/fd is correct and to ensure that label-based LSMs have access to the overlay as well as the underlay (path-based LSMs probably don't need it). Using my union testsuite to set things up, before the patch I see: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:38 5 -> /a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 13381 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 13381 Links: 1 ... After the patch: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:22 5 -> /mnt/a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 40346 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 40346 Links: 1 ... Note the change in where /proc/$$/fd/5 points to in the ls command. It was pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107 (which is correct). The inode accessed, however, is the lower layer. The union layer is on device 25h/37d and the upper layer on 24h/36d. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: "Kamata, Munehisa" <kamatam@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22mmc: sdhci: fix dma memory leak in sdhci_pre_req()Haibo Chen
commit d31911b9374a76560d2c8ea4aa6ce5781621e81d upstream. Currently one mrq->data maybe execute dma_map_sg() twice when mmc subsystem prepare over one new request, and the following log show up: sdhci[sdhci_pre_dma_transfer] invalid cookie: 24, next-cookie 25 In this condition, mrq->date map a dma-memory(1) in sdhci_pre_req for the first time, and map another dma-memory(2) in sdhci_prepare_data for the second time. But driver only unmap the dma-memory(2), and dma-memory(1) never unmapped, which cause the dma memory leak issue. This patch use another method to map the dma memory for the mrq->data which can fix this dma memory leak issue. Fixes: 348487cb28e6 ("mmc: sdhci: use pipeline mmc requests to improve performance") Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Haibo Chen <haibo.chen@freescale.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22security: fix typo in security_task_prctlJann Horn
commit b7f76ea2ef6739ee484a165ffbac98deb855d3d3 upstream. Signed-off-by: Jann Horn <jann@thejh.net> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22netfilter: nf_qeueue: Drop queue entries on nf_unregister_hookEric W. Biederman
commit 8405a8fff3f8545c888a872d6e3c0c8eecd4d348 upstream. Add code to nf_unregister_hook to flush the nf_queue when a hook is unregistered. This guarantees that the pointer that the nf_queue code retains into the nf_hook list will remain valid while a packet is queued. I tested what would happen if we do not flush queued packets and was trivially able to obtain the oops below. All that was required was to stop the nf_queue listening process, to delete all of the nf_tables, and to awaken the nf_queue listening process. > BUG: unable to handle kernel paging request at 0000000100000001 > IP: [<0000000100000001>] 0x100000001 > PGD b9c35067 PUD 0 > Oops: 0010 [#1] SMP > Modules linked in: > CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted > task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000 > RIP: 0010:[<0000000100000001>] [<0000000100000001>] 0x100000001 > RSP: 0018:ffff8800ba9dba40 EFLAGS: 00010a16 > RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90 > RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00 > RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28 > R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900 > R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000 > FS: 00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0 > Stack: > ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8 > ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128 > ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0 > Call Trace: > [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0 > [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190 > [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360 > [<ffffffff81386290>] ? nla_parse+0x80/0xf0 > [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240 > [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150 > [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20 > [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0 > [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0 > [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650 > [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50 > [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0 > [<ffffffff810e8f73>] ? __wake_up+0x43/0x70 > [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0 > [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80 > [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a > Code: Bad RIP value. > RIP [<0000000100000001>] 0x100000001 > RSP <ffff8800ba9dba40> > CR2: 0000000100000001 > ---[ end trace 08eb65d42362793f ]--- Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22netfilter: nf_tables: Use 32 bit addressing register from nft_type_to_reg()Pablo Neira Ayuso
commit bf798657eb5ba57552096843c315f096fdf9b715 upstream. nft_type_to_reg() needs to return the register in the new 32 bit addressing, otherwise we hit EINVAL when using mappings. Fixes: 49499c3 ("netfilter: nf_tables: switch registers to 32 bit addressing") Reported-by: Andreas Schultz <aschultz@tpip.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22x86/xen: Support kexec/kdump in HVM guests by doing a soft resetVitaly Kuznetsov
commit 0b34a166f291d255755be46e43ed5497cdd194f2 upstream. Currently there is a number of issues preventing PVHVM Xen guests from doing successful kexec/kdump: - Bound event channels. - Registered vcpu_info. - PIRQ/emuirq mappings. - shared_info frame after XENMAPSPACE_shared_info operation. - Active grant mappings. Basically, newly booted kernel stumbles upon already set up Xen interfaces and there is no way to reestablish them. In Xen-4.7 a new feature called 'soft reset' is coming. A guest performing kexec/kdump operation is supposed to call SCHEDOP_shutdown hypercall with SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor (with some help from toolstack) will do full domain cleanup (but keeping its memory and vCPU contexts intact) returning the guest to the state it had when it was first booted and thus allowing it to start over. Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is probably OK as by default all unknown shutdown reasons cause domain destroy with a message in toolstack log: 'Unknown shutdown reason code 5. Destroying domain.' which gives a clue to what the problem is and eliminates false expectations. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22target/iscsi: Fix np_ip bracket issue by removing np_ipAndy Grover
commit 76c28f1fcfeb42b47f798fe498351ee1d60086ae upstream. Revert commit 1997e6259, which causes double brackets on ipv6 inaddr_any addresses. Since we have np_sockaddr, if we need a textual representation we can use "%pISc". Change iscsit_add_network_portal() and iscsit_add_np() signatures to remove *ip_str parameter. Fix and extend some comments earlier in the function. Tested to work for :: and ::1 via iscsiadm, previously :: failed, see https://bugzilla.redhat.com/show_bug.cgi?id=1249107 . Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29net: sched: fix refcount imbalance in actionsDaniel Borkmann
[ Upstream commit 28e6b67f0b292f557468c139085303b15f1a678f ] Since commit 55334a5db5cd ("net_sched: act: refuse to remove bound action outside"), we end up with a wrong reference count for a tc action. Test case 1: FOO="1,6 0 0 4294967295," BAR="1,6 0 0 4294967294," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \ action bpf bytecode "$FOO" tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 1 bind 1 tc actions replace action bpf bytecode "$BAR" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 1 ref 2 bind 1 tc actions replace action bpf bytecode "$FOO" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 3 bind 1 Test case 2: FOO="1,6 0 0 4294967295," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 2 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 3 bind 1 What happens is that in tcf_hash_check(), we check tcf_common for a given index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've found an existing action. Now there are the following cases: 1) We do a late binding of an action. In that case, we leave the tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init() handler. This is correctly handeled. 2) We replace the given action, or we try to add one without replacing and find out that the action at a specific index already exists (thus, we go out with error in that case). In case of 2), we have to undo the reference count increase from tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to do so because of the 'tcfc_bindcnt > 0' check which bails out early with an -EPERM error. Now, while commit 55334a5db5cd prevents 'tc actions del action ...' on an already classifier-bound action to drop the reference count (which could then become negative, wrap around etc), this restriction only accounts for invocations outside a specific action's ->init() handler. One possible solution would be to add a flag thus we possibly trigger the -EPERM ony in situations where it is indeed relevant. After the patch, above test cases have correct reference count again. Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29ipv6: lock socket in ip6_datagram_connect()Eric Dumazet
[ Upstream commit 03645a11a570d52e70631838cb786eb4253eb463 ] ip6_datagram_connect() is doing a lot of socket changes without socket being locked. This looks wrong, at least for udp_lib_rehash() which could corrupt lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29jbd2: avoid infinite loop when destroying aborted journalJan Kara
commit 841df7df196237ea63233f0f9eaa41db53afd70f upstream. Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal superblock fails" changed jbd2_cleanup_journal_tail() to return EIO when the journal is aborted. That makes logic in jbd2_log_do_checkpoint() bail out which is fine, except that jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make a progress in cleaning the journal. Without it jbd2_journal_destroy() just loops in an infinite loop. Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of jbd2_log_do_checkpoint() fails with error. Reported-by: Eryu Guan <guaneryu@gmail.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29iommu/tegra-smmu: Parameterize number of TLB linesThierry Reding
commit 11cec15bf3fb498206ef63b1fa26c27689e02d0e upstream. The number of TLB lines was increased from 16 on Tegra30 to 32 on Tegra114 and later. Parameterize the value so that the initial default can be set accordingly. On Tegra30, initializing the value to 32 would effectively disable the TLB and hence cause massive latencies for memory accesses translated through the SMMU. This is especially noticeable for isochronuous clients such as display, whose FIFOs would continuously underrun. Fixes: 891846516317 ("memory: Add NVIDIA Tegra memory controller support") Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29SUNRPC: Ensure that we wait for connections to complete before retryingTrond Myklebust
commit 0fdea1e8a2853f79d39b8555cc9de16a7e0ab26f upstream. Commit 718ba5b87343, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: Russell King <linux@arm.linux.org.uk> Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29net: sunrpc: fix tracepoint Warning: unknown op '->'Pratyush Anand
commit 051ac3848a94f21cfdec899cc9c65ce7f9f116fa upstream. `perf stat -e sunrpc:svc_xprt_do_enqueue true` results in Warning: unknown op '->' Warning: [sunrpc:svc_xprt_do_enqueue] unknown op '->' Similar warning for svc_handle_xprt as well. Actually TP_printk() should never dereference an address saved in the ring buffer that points somewhere in the kernel. There's no guarantee that that object still exists (with the exception of static strings). Therefore change all the arguments for TP_printk(), so that it references values existing in the ring buffer only. While doing that, also fix another possible bug when argument xprt could be NULL and TP_fast_assign() tries to access it's elements. Signed-off-by: Pratyush Anand <panand@redhat.com> Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Fixes: 83a712e0afef "sunrpc: add some tracepoints around ..." Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-29mm: make page pfmemalloc check more robustMichal Hocko
commit 2f064f3485cd29633ad1b3cfb00cc519509a3d72 upstream. Commit c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") added checks for page->pfmemalloc to __skb_fill_page_desc(): if (page->pfmemalloc && !page->mapping) skb->pfmemalloc = true; It assumes page->mapping == NULL implies that page->pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page->mapping to NULL and leave page->index value alone. Due to being in union, a non-zero page->index will be interpreted as true page->pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here and that leads to hangs when the nfs client waits for a response from the server which has been dropped and thus never arrive. The struct page is already heavily packed so rather than finding another hole to put it in, let's do a trick instead. We can reuse the index again but define it to an impossible value (-1UL). This is the page index so it should never see the value that large. Replace all direct users of page->pfmemalloc by page_is_pfmemalloc which will hide this nastiness from unspoiled eyes. The information will get lost if somebody wants to use page->index obviously but that was the case before and the original code expected that the information should be persisted somewhere else if that is really needed (e.g. what SLAB and SLUB do). [akpm@linux-foundation.org: fix blooper in slub] Fixes: c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") Signed-off-by: Michal Hocko <mhocko@suse.com> Debugged-by: Vlastimil Babka <vbabka@suse.com> Debugged-by: Jiri Bohac <jbohac@suse.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21fs: create and use seq_show_option for escapingKees Cook
commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream. Many file systems that implement the show_options hook fail to correctly escape their output which could lead to unescaped characters (e.g. new lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This could lead to confusion, spoofed entries (resulting in things like systemd issuing false d-bus "mount" notifications), and who knows what else. This looks like it would only be the root user stepping on themselves, but it's possible weird things could happen in containers or in other situations with delegated mount privileges. Here's an example using overlay with setuid fusermount trusting the contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of "sudo" is something more sneaky: $ BASE="ovl" $ MNT="$BASE/mnt" $ LOW="$BASE/lower" $ UP="$BASE/upper" $ WORK="$BASE/work/ 0 0 none /proc fuse.pwn user_id=1000" $ mkdir -p "$LOW" "$UP" "$WORK" $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt $ cat /proc/mounts none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 none /proc fuse.pwn user_id=1000 0 0 $ fusermount -u /proc $ cat /proc/mounts cat: /proc/mounts: No such file or directory This fixes the problem by adding new seq_show_option and seq_show_option_n helpers, and updating the vulnerable show_option handlers to use them as needed. Some, like SELinux, need to be open coded due to unusual existing escape mechanisms. [akpm@linux-foundation.org: add lost chunk, per Kees] [keescook@chromium.org: seq_show_option should be using const parameters] Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Jan Kara <jack@suse.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21ACPI, PCI: Penalize legacy IRQ used by ACPI SCIJiang Liu
commit 5d0ddfebb93069061880fc57ee4ba7246bd1e1ee upstream. Nick Meier reported a regression with HyperV that " After rebooting the VM, the following messages are logged in syslog when trying to load the tulip driver: tulip: Linux Tulip drivers version 1.1.15 (Feb 27, 2007) tulip: 0000:00:0a.0: PCI INT A: failed to register GSI tulip: Cannot enable tulip board #0, aborting tulip: probe of 0000:00:0a.0 failed with error -16 Errors occur in 3.19.0 kernel Works in 3.17 kernel. " According to the ACPI dump file posted by Nick at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072 The ACPI MADT table includes an interrupt source overridden entry for ACPI SCI: [236h 0566 1] Subtable Type : 02 <Interrupt Source Override> [237h 0567 1] Length : 0A [238h 0568 1] Bus : 00 [239h 0569 1] Source : 09 [23Ah 0570 4] Interrupt : 00000009 [23Eh 0574 2] Flags (decoded below) : 000D Polarity : 1 Trigger Mode : 3 And in DSDT table, we have _PRT method to define PCI interrupts, which eventually goes to: Name (PRSA, ResourceTemplate () { IRQ (Level, ActiveLow, Shared, ) {3,4,5,7,9,10,11,12,14,15} }) Name (PRSB, ResourceTemplate () { IRQ (Level, ActiveLow, Shared, ) {3,4,5,7,9,10,11,12,14,15} }) Name (PRSC, ResourceTemplate () { IRQ (Level, ActiveLow, Shared, ) {3,4,5,7,9,10,11,12,14,15} }) Name (PRSD, ResourceTemplate () { IRQ (Level, ActiveLow, Shared, ) {3,4,5,7,9,10,11,12,14,15} }) According to the MADT and DSDT tables, IRQ 9 may be used for: 1) ACPI SCI in level, high mode 2) PCI legacy IRQ in level, low mode So there's a conflict in polarity setting for IRQ 9. Prior to commit cd68f6bd53cf ("x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI"), ACPI SCI is handled specially and there's no check for conflicts between ACPI SCI and PCI legagy IRQ. And it seems that the HyperV hypervisor doesn't make use of the polarity configuration in IOAPIC entry, so it just works. Commit cd68f6bd53cf gets rid of the specially handling of ACPI SCI, and then the pin attribute checking code discloses the conflicts between ACPI SCI and PCI legacy IRQ on HyperV virtual machine, and rejects the request to assign IRQ9 to PCI devices. So penalize legacy IRQ used by ACPI SCI and mark it unusable if ACPI SCI attributes conflict with PCI IRQ attributes. Please refer to following links for more information: https://bugzilla.kernel.org/show_bug.cgi?id=101301 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072 Fixes: cd68f6bd53cf ("x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI") Reported-and-tested-by: Nick Meier <nmeier@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21PCI: Add dev_flags bit to access VPD through function 0Mark Rustad
commit 932c435caba8a2ce473a91753bad0173269ef334 upstream. Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through function 0 to provide VPD access on other functions. This is for hardware devices that provide copies of the same VPD capability registers in multiple functions. Because the kernel expects that each function has its own registers, both the locking and the state tracking are affected by VPD accesses to different functions. On such devices for example, if a VPD write is performed on function 0, *any* later attempt to read VPD from any other function of that device will hang. This has to do with how the kernel tracks the expected value of the F bit per function. Concurrent accesses to different functions of the same device can not only hang but also corrupt both read and write VPD data. When hangs occur, typically the error message: vpd r/w failed. This is likely a firmware bug on this device. will be seen. Never set this bit on function 0 or there will be an infinite recursion. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-21iio: Add inverse unit conversion macrosLars-Peter Clausen
commit c689a923c867eac40ed3826c1d9328edea8b6bc7 upstream. Add inverse unit conversion macro to convert from standard IIO units to units that might be used by some devices. Those are useful in combination with scale factors that are specified as IIO_VAL_FRACTIONAL. Typically the denominator for those specifications will contain the maximum raw value the sensor will generate and the numerator the value it maps to in a specific unit. Sometimes datasheets specify those in different units than the standard IIO units (e.g. degree/s instead of rad/s) and so we need to do a unit conversion. From a mathematical point of view it does not make a difference whether we apply the unit conversion to the numerator or the inverse unit conversion to the denominator since (x / y) / z = x / (y * z). But as the denominator is typically a larger value and we are rounding both the numerator and denominator to integer values using the later method gives us a better precision (E.g. the relative error is smaller if we round 8000.3 to 8000 rather than rounding 8.3 to 8). This is where in inverse unit conversion macros will be used. Marked for stable as used by some upcoming fixes. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13genirq: Introduce irq_chip_set_type_parent() helperGrygorii Strashko
commit b7560de198222994374c1340a389f12d5efb244a upstream. This helper is required for irq chips which do not implement a irq_set_type callback and need to call down the irq domain hierarchy for the actual trigger type change. This helper is required to fix further wreckage caused by the conversion of TI OMAP to hierarchical irq domains and therefor tagged for stable. [ tglx: Massaged changelog ] Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: <linux@arm.linux.org.uk> Cc: <nsekhar@ti.com> Cc: <jason@lakedaemon.net> Cc: <balbi@ti.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: <tony@atomide.com> Cc: <marc.zyngier@arm.com> Cc: stable@vger.kernel.org # 4.1 Link: http://lkml.kernel.org/r/1439554830-19502-3-git-send-email-grygorii.strashko@ti.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13Revert "libata: Implement NCQ autosense"Tejun Heo
commit 74a80d67b8316eb3fbeb73dafc060a5a0a708587 upstream. This reverts commit 42b966fbf35da9c87f08d98f9b8978edf9e717cf. As implemented, ACS-4 sense reporting for ATA devices bypasses error diagnosis and handling in libata degrading EH behavior significantly. Revert the related changes for now. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13Revert "libata: Implement support for sense data reporting"Tejun Heo
commit 84ded2f8e7dda336fc2fb3570726ceb3b3b3590f upstream. This reverts commit fe7173c206de63fc28475ee6ae42ff95c05692de. As implemented, ACS-4 sense reporting for ATA devices bypasses error diagnosis and handling in libata degrading EH behavior significantly. Revert the related changes for now. ATA_ID_COMMAND_SET_3/4 constants are not reverted as they're used by later changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13Revert "libata-eh: Set 'information' field for autosense"Tejun Heo
commit fe16d4f202c59a560533a223bc6375739ee30944 upstream. This reverts commit a1524f226a02aa6edebd90ae0752e97cfd78b159. As implemented, ACS-4 sense reporting for ATA devices bypasses error diagnosis and handling in libata degrading EH behavior significantly. Revert the related changes for now. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13drm/radeon: add new OLAND pci idAlex Deucher
commit e037239e5e7b61007763984aa35a8329596d8c88 upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16mtd: nand: Fix NAND_USE_BOUNCE_BUFFER flag conflictScott Wood
commit 5f867db63473f32cce1b868e281ebd42a41f8fad upstream. Commit 66507c7bc8895f0da6b ("mtd: nand: Add support to use nand_base poi databuf as bounce buffer") added a flag NAND_USE_BOUNCE_BUFFER using the same bit value as the existing NAND_BUSWIDTH_AUTO. Cc: Kamal Dasu <kdasu.kdev@gmail.com> Fixes: 66507c7bc8895f0da6b ("mtd: nand: Add support to use nand_base poi databuf as bounce buffer") Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16PCI: Restore PCI_MSIX_FLAGS_BIRMASK definitionMichael S. Tsirkin
commit c9ddbac9c89110f77cb0fa07e634aaf1194899aa upstream. 09a2c73ddfc7 ("PCI: Remove unused PCI_MSIX_FLAGS_BIRMASK definition") removed PCI_MSIX_FLAGS_BIRMASK from an exported header because it was unused in the kernel. But that breaks user programs that were using it (QEMU in particular). Restore the PCI_MSIX_FLAGS_BIRMASK definition. [bhelgaas: changelog] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>