summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)Author
2015-10-27s390/boot/decompression: disable floating point in decompressorChristian Borntraeger
[ Upstream commit adc0b7fbf6fe9967505c0254d9535ec7288186ae ] my gcc 5.1 used an ldgr instruction with a register != 0,2,4,6 for spilling/filling into a floating point register in our decompressor. This will cause an AFP-register data exception as the decompressor did not setup the additional floating point registers via cr0. That causes a program check loop that looked like a hang with one "Uncompressing Linux... " message (directly booted via kvm) or a loop of "Uncompressing Linux... " messages (when booted via zipl boot loader). The offending code in my build was 48e400: e3 c0 af ff ff 71 lay %r12,-1(%r10) -->48e406: b3 c1 00 1c ldgr %f1,%r12 48e40a: ec 6c 01 22 02 7f clij %r6,2,12,0x48e64e but gcc could do spilling into an fpr at any function. We can simply disable floating point support at that early stage. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-10-27s390/compat: correct uc_sigmask of the compat signal frameMartin Schwidefsky
[ Upstream commit 8d4bd0ed0439dfc780aab801a085961925ed6838 ] The uc_sigmask in the ucontext structure is an array of words to keep the 64 signal bits (or 1024 if you ask glibc but the kernel sigset_t only has 64 bits). For 64 bit the sigset_t contains a single 8 byte word, but for 31 bit there are two 4 byte words. The compat signal handler code uses a simple copy of the 64 bit sigset_t to the 31 bit compat_sigset_t. As s390 is a big-endian architecture this is incorrect, the two words in the 31 bit sigset_t array need to be swapped. Cc: <stable@vger.kernel.org> Reported-by: Stefan Liebler <stli@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-08-27kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFPMartin Schwidefsky
[ Upstream commit 7e01b5acd88b3f3108d8c4ce44e3205d67437202 ] Introduce KEXEC_CONTROL_MEMORY_GFP to allow the architecture code to override the gfp flags of the allocation for the kexec control page. The loop in kimage_alloc_normal_control_pages allocates pages with GFP_KERNEL until a page is found that happens to have an address smaller than the KEXEC_CONTROL_MEMORY_LIMIT. On systems with a large memory size but a small KEXEC_CONTROL_MEMORY_LIMIT the loop will keep allocating memory until the oom killer steps in. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-08-27s390/sclp: clear upper register halves in _sclp_print_earlyMartin Schwidefsky
[ Upstream commit f9c87a6f46d508eae0d9ae640be98d50f237f827 ] If the kernel is compiled with gcc 5.1 and the XZ compression option the decompress_kernel function calls _sclp_print_early in 64-bit mode while the content of the upper register half of %r6 is non-zero. This causes a specification exception on the servc instruction in _sclp_servc. The _sclp_print_early function saves and restores the upper registers halves but it fails to clear them for the 31-bit code of the mini sclp driver. Cc: <stable@vger.kernel.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-05s390/kdump: fix REGSET_VX_LOW vector register ELF notesMichael Holzheu
[ Upstream commit 3c8e5105e759e7b2d88ea8a85b1285e535bc7500 ] The REGSET_VX_LOW ELF notes should contain the lower 64 bit halfes of the first sixteen 128 bit vector registers. Unfortunately currently we copy the upper halfes. Fix this and correctly copy the lower halfes. Fixes: a62bc0739253 ("s390/kdump: add support for vector extension") Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10s390/mm: correct return value of pmd_pfnMartin Schwidefsky
[ Upstream commit 7cded342c09f633666e71ee1ce048f218a9c5836 ] Git commit 152125b7a882df36a55a8eadbea6d0edf1461ee7 "s390/mm: implement dirty bits for large segment table entries" broke the pmd_pfn function, it changed the return value from 'unsigned long' to 'int'. This breaks all machine configurations with memory above the 8TB line. Cc: stable@vger.kernel.org # 3.17+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10crypto: s390/ghash - Fix incorrect ghash icv buffer handling.Harald Freudenberger
[ Upstream commit a1cae34e23b1293eccbcc8ee9b39298039c3952a ] Multitheaded tests showed that the icv buffer in the current ghash implementation is not handled correctly. A move of this working ghash buffer value to the descriptor context fixed this. Code is tested and verified with an multithreaded application via af_alg interface. Cc: stable@vger.kernel.org Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Gerald Schaefer <geraldsc@linux.vnet.ibm.com> Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17mm/hugetlb: use pmd_page() in follow_huge_pmd()Naoya Horiguchi
[ Upstream commit 97534127012f0e396eddea4691f4c9b170aed74b ] Commit 61f77eda9bbf ("mm/hugetlb: reduce arch dependent code around follow_huge_*") broke follow_huge_pmd() on s390, where pmd and pte layout differ and using pte_page() on a huge pmd will return wrong results. Using pmd_page() instead fixes this. All architectures that were touched by that commit have pmd_page() defined, so this should not break anything on other architectures. Fixes: 61f77eda "mm/hugetlb: reduce arch dependent code around follow_huge_*" Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.cz>, Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17s390/hibernate: fix save and restore of kernel text sectionHeiko Carstens
[ Upstream commit d74419495633493c9cd3f2bbeb7f3529d0edded6 ] Sebastian reported a crash caused by a jump label mismatch after resume. This happens because we do not save the kernel text section during suspend and therefore also do not restore it during resume, but use the kernel image that restores the old system. This means that after a suspend/resume cycle we lost all modifications done to the kernel text section. The reason for this is the pfn_is_nosave() function, which incorrectly returns that read-only pages don't need to be saved. This is incorrect since we mark the kernel text section read-only. We still need to make sure to not save and restore pages contained within NSS and DCSS segment. To fix this add an extra case for the kernel text section and only save those pages if they are not contained within an NSS segment. Fixes the following crash (and the above bugs as well): Jump label code mismatch at netif_receive_skb_internal+0x28/0xd0 Found: c0 04 00 00 00 00 Expected: c0 f4 00 00 00 11 New: c0 04 00 00 00 00 Kernel panic - not syncing: Corrupted kernel text CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0-01975-gb1b096e70f23 #4 Call Trace: [<0000000000113972>] show_stack+0x72/0xf0 [<000000000081f15e>] dump_stack+0x6e/0x90 [<000000000081c4e8>] panic+0x108/0x2b0 [<000000000081be64>] jump_label_bug.isra.2+0x104/0x108 [<0000000000112176>] __jump_label_transform+0x9e/0xd0 [<00000000001121e6>] __sm_arch_jump_label_transform+0x3e/0x50 [<00000000001d1136>] multi_cpu_stop+0x12e/0x170 [<00000000001d1472>] cpu_stopper_thread+0xb2/0x168 [<000000000015d2ac>] smpboot_thread_fn+0x134/0x1b0 [<0000000000158baa>] kthread+0x10a/0x110 [<0000000000824a86>] kernel_thread_starter+0x6/0xc Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17KVM: s390: fix get_all_floating_irqsJens Freimann
[ Upstream commit 94aa033efcac47b09db22cb561e135baf37b7887 ] This fixes a bug introduced with commit c05c4186bbe4 ("KVM: s390: add floating irq controller"). get_all_floating_irqs() does copy_to_user() while holding a spin lock. Let's fix this by filling a temporary buffer first and copy it to userspace after giving up the lock. Cc: <stable@vger.kernel.org> # 3.18+: 69a8d4562638 KVM: s390: no need to hold... Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17KVM: s390: no need to hold the kvm->mutex for floating interruptsChristian Borntraeger
[ Upstream commit 69a8d456263849152826542c7cb0a164b90e68a8 ] The kvm mutex was (probably) used to protect against cpu hotplug. The current code no longer needs to protect against that, as we only rely on CPU data structures that are guaranteed to be available if we can access the CPU. (e.g. vcpu_create will put the cpu in the array AFTER the cpu is ready). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17KVM: s390: Zero out current VMDB of STSI before including level3 data.Ekaterina Tumanova
[ Upstream commit b75f4c9afac2604feb971441116c07a24ecca1ec ] s390 documentation requires words 0 and 10-15 to be reserved and stored as zeros. As we fill out all other fields, we can memset the full structure. Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17KVM: s390: reinjection of irqs can fail in the tpi handlerDavid Hildenbrand
[ Upstream commit 15462e37ca848abac7477dece65f8af25febd744 ] The reinjection of an I/O interrupt can fail if the list is at the limit and between the dequeue and the reinjection, another I/O interrupt is injected (e.g. if user space floods kvm with I/O interrupts). This patch avoids this memory leak and returns -EFAULT in this special case. This error is not recoverable, so let's fail hard. This can later be avoided by not dequeuing the interrupt but working directly on the locked list. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17KVM: s390: fix handling of write errors in the tpi handlerDavid Hildenbrand
[ Upstream commit 261520dcfcba93ca5dfe671b88ffab038cd940c8 ] If the I/O interrupt could not be written to the guest provided area (e.g. access exception), a program exception was injected into the guest but "inti" wasn't freed, therefore resulting in a memory leak. In addition, the I/O interrupt wasn't reinjected. Therefore the dequeued interrupt is lost. This patch fixes the problem while cleaning up the function and making the cc and rc logic easier to handle. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-28kvm: move advertising of KVM_CAP_IRQFD to common codePaolo Bonzini
[ Upstream commit dc9be0fac70a2ad86e31a81372bb0bdfb6945353 ] POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 297e21053a52f060944e9f0de4c64fad9bcd72fc Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-06KVM: s390: avoid memory leaks if __inject_vm() failsDavid Hildenbrand
commit 428d53be5e7468769d4e7899cca06ed5f783a6e1 upstream. We have to delete the allocated interrupt info if __inject_vm() fails. Otherwise user space can keep flooding kvm with floating interrupts and provoke more and more memory leaks. Reported-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06KVM: s390: floating irqs: fix user triggerable endless loopDavid Hildenbrand
commit 8e2207cdd087ebb031e9118d1fd0902c6533a5e5 upstream. If a vm with no VCPUs is created, the injection of a floating irq leads to an endless loop in the kernel. Let's skip the search for a destination VCPU for a floating irq if no VCPUs were created. Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06KVM: s390: base hrtimer on a monotonic clockDavid Hildenbrand
commit 0ac96caf0f9381088c673a16d910b1d329670edf upstream. The hrtimer that handles the wait with enabled timer interrupts should not be disturbed by changes of the host time. This patch changes our hrtimer to be based on a monotonic clock. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06KVM: s390: forward hrtimer if guest ckc not pending yetDavid Hildenbrand
commit 2d00f759427bb3ed963b60f570830e9eca7e1c69 upstream. Patch 0759d0681cae ("KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block") changed the way pending guest clock comparator interrupts are detected. It was assumed that as soon as the hrtimer wakes up, the condition for the guest ckc is satisfied. This is however only true as long as adjclock() doesn't speed up the monotonic clock. Reason is that the hrtimer is based on CLOCK_MONOTONIC, the guest clock comparator detection is based on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the TOD clock, the hrtimer wakes the target VCPU up too early and the target VCPU will not detect any pending interrupts, therefore going back to sleep. It will never be woken up again because the hrtimer has finished. The VCPU is stuck. As a quick fix, we have to forward the hrtimer until the guest clock comparator is really due, to guarantee properly timed wake ups. As the hrtimer callback might be triggered on another cpu, we have to make sure that the timer is really stopped and not currently executing the callback on another cpu. This can happen if the vcpu thread is scheduled onto another physical cpu, but the timer base is not migrated. So lets use hrtimer_cancel instead of try_to_cancel. A proper fix might be to introduce a RAW based hrtimer. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05vm: add VM_FAULT_SIGSEGV handling supportLinus Torvalds
commit 33692f27597fcab536d7cbbcc8f52905133e4aa7 upstream. The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29crypto: prefix module autoloading with "crypto-"Kees Cook
commit 5d26a105b5a73e5635eae0629b42fa0a90e07b7b upstream. This prefixes all crypto module loading with "crypto-" so we never run the risk of exposing module auto-loading to userspace via a crypto API, as demonstrated by Mathias Krause: https://lkml.org/lkml/2013/3/4/70 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16KVM: s390: Fix ipte lockingChristian Borntraeger
commit 1365039d0cb32c0cf96eb9f750f4277c9a90f87d upstream. ipte_unlock_siif uses cmpxchg to replace the in-memory data of the ipte lock together with ACCESS_ONCE for the intial read. union ipte_control { unsigned long val; struct { unsigned long k : 1; unsigned long kh : 31; unsigned long kg : 32; }; }; [...] static void ipte_unlock_siif(struct kvm_vcpu *vcpu) { union ipte_control old, new, *ic; ic = &vcpu->kvm->arch.sca->ipte_control; do { new = old = ACCESS_ONCE(*ic); new.kh--; if (!new.kh) new.k = 0; } while (cmpxchg(&ic->val, old.val, new.val) != old.val); if (!new.kh) wake_up(&vcpu->kvm->arch.ipte_wq); } The new value, is loaded twice from memory with gcc 4.7.2 of fedora 18, despite the ACCESS_ONCE: ---> l %r4,0(%r3) <--- load first 32 bit of lock (k and kh) in r4 alfi %r4,2147483647 <--- add -1 to r4 llgtr %r4,%r4 <--- zero out the sign bit of r4 lg %r1,0(%r3) <--- load all 64 bit of lock into new lgr %r2,%r1 <--- load the same into old risbg %r1,%r4,1,31,32 <--- shift and insert r4 into the bits 1-31 of new llihf %r4,2147483647 ngrk %r4,%r1,%r4 jne aa0 <ipte_unlock+0xf8> nihh %r1,32767 lgr %r4,%r2 csg %r4,%r1,0(%r3) cgr %r2,%r4 jne a70 <ipte_unlock+0xc8> If the memory value changes between the first load (l) and the second load (lg) we are broken. If that happens VCPU threads will hang (unkillable) in handle_ipte_interlock. Andreas Krebbel analyzed this and tracked it down to a compiler bug in that version: "while it is not that obvious the C99 standard basically forbids duplicating the memory access also in that case. For an argumentation of a similiar case please see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278#c43 For the implementation-defined cases regarding volatile there are some GCC-specific clarifications which can be found here: https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles I've tracked down the problem with a reduced testcase. The problem was that during a tree level optimization (SRA - scalar replacement of aggregates) the volatile marker is lost. And an RTL level optimizer (CSE - common subexpression elimination) then propagated the memory read into its second use introducing another access to the memory location. So indeed Christian's suspicion that the union access has something to do with it is correct (since it triggered the SRA optimization). This issue has been reported and fixed in the GCC 4.8 development cycle: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145" This patch replaces the ACCESS_ONCE scheme with a barrier() based scheme that should work for all supported compilers. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16KVM: s390: flush CPU on load controlChristian Borntraeger
commit 2dca485f8740208604543c3960be31a5dd3ea603 upstream. some control register changes will flush some aspects of the CPU, e.g. POP explicitely mentions that for CR9-CR11 "TLBs may be cleared". Instead of trying to be clever and only flush on specific CRs, let play safe and flush on all lctl(g) as future machines might define new bits in CRs. Load control intercept should not happen that often. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16KVM: s390: Fix size of monitor-class number fieldThomas Huth
commit a36c5393266222129ce6f622e3bc3fb5463f290c upstream. The monitor-class number field is only 16 bits, so we have to use a u16 pointer to access it. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08groups: Consolidate the setgroups permission checksEric W. Biederman
commit 7ff4d90b4c24a03666f296c3d4878cd39001e81e upstream. Today there are 3 instances of setgroups and due to an oversight their permission checking has diverged. Add a common function so that they may all share the same permission checking code. This corrects the current oversight in the current permission checks and adds a helper to avoid this in the future. A user namespace security fix will update this new helper, shortly. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-01s390: fix machine check handlingSebastian Ott
Commit eb7e7d76 "s390: Replace __get_cpu_var uses" broke machine check handling. We copy machine check information from per-cpu to a stack variable for local processing. Next we should zap the per-cpu variable, not the stack variable. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 update from Martin Schwidefsky: "One small improvement for the cputime accounting, two bug fixes and an update for the default configuration files" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/ftrace: add ftrace_graph_is_dead() check s390: update default configuration s390/vdso: fix stack corruption s390/time: use stck clock fast for do_account_vtime
2014-10-31Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, plus on the kernel side: - a revert for a newly introduced PMU driver which isn't complete yet and where we ran out of time with fixes (to be tried again in v3.19) - this makes up for a large chunk of the diffstat. - compilation warning fixes - a printk message fix - event_idx usage fixes/cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf probe: Trivial typo fix for --demangle perf tools: Fix report -F dso_from for data without branch info perf tools: Fix report -F dso_to for data without branch info perf tools: Fix report -F symbol_from for data without branch info perf tools: Fix report -F symbol_to for data without branch info perf tools: Fix report -F mispredict for data without branch info perf tools: Fix report -F in_tx for data without branch info perf tools: Fix report -F abort for data without branch info perf tools: Make CPUINFO_PROC an array to support different kernel versions perf callchain: Use global caching provided by libunwind perf/x86/intel: Revert incomplete and undocumented Broadwell client support perf/x86: Fix compile warnings for intel_uncore perf: Fix typos in sample code in the perf_event.h header perf: Fix and clean up initialization of pmu::event_idx perf: Fix bogus kernel printk perf diff: Add missing hists__init() call at tool start
2014-10-28perf: Fix and clean up initialization of pmu::event_idxPeter Zijlstra
Andy reported that the current state of event_idx is rather confused. So remove all but the x86_pmu implementation and change the default to return 0 (the safe option). Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Cody P Schafer <dev@codyps.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Himangi Saraogi <himangi774@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com> Cc: Thomas Huth <thuth@linux.vnet.ibm.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux390@de.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28s390/ftrace: add ftrace_graph_is_dead() checkHeiko Carstens
Add an ftrace_graph_is_dead() check to prepare_ftrace_return() in order to detect an internal ftrace graph error. This allows to prevent further ftrace graph handling and hopefully keeps the kernel alive. This patch is the same like for all other architectures. For unkown reasons s390 was left out. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390: update default configurationMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/vdso: fix stack corruptionHeiko Carstens
The kernel provided vdso functions do not get a stack frame from the calling function and therefore may not change the stack contents, unless they allocate space on their own. This problem was exposed with 070b7be633dc "s390/vdso: replace stck with stcke" which writes 16 bytes instead of 8 bytes into the stack frame. These additional 8 bytes however were indeed used by the caller (glibc) to save data and therefore this data was corrupted by the vdso code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/time: use stck clock fast for do_account_vtimeMartin Schwidefsky
The last high frequency call site of the STCK instruction is do_account_vtime. Replace it with the faster STCKF instruction. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "One patch to enable the BPF system call and three more bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/uprobes: fix kprobes dependency s390: wire up bpf syscall s390/mm: fixing calls of pte_unmap_unlock s390/hmcdrv: Restrict s390 HMC driver to S390 arch
2014-10-19Merge git://git.infradead.org/users/eparis/auditLinus Torvalds
Pull audit updates from Eric Paris: "So this change across a whole bunch of arches really solves one basic problem. We want to audit when seccomp is killing a process. seccomp hooks in before the audit syscall entry code. audit_syscall_entry took as an argument the arch of the given syscall. Since the arch is part of what makes a syscall number meaningful it's an important part of the record, but it isn't available when seccomp shoots the syscall... For most arch's we have a better way to get the arch (syscall_get_arch) So the solution was two fold: Implement syscall_get_arch() everywhere there is audit which didn't have it. Use syscall_get_arch() in the seccomp audit code. Having syscall_get_arch() everywhere meant it was a useless flag on the stack and we could get rid of it for the typical syscall entry. The other changes inside the audit system aren't grand, fixed some records that had invalid spaces. Better locking around the task comm field. Removing some dead functions and structs. Make some things static. Really minor stuff" * git://git.infradead.org/users/eparis/audit: (31 commits) audit: rename audit_log_remove_rule to disambiguate for trees audit: cull redundancy in audit_rule_change audit: WARN if audit_rule_change called illegally audit: put rule existence check in canonical order next: openrisc: Fix build audit: get comm using lock to avoid race in string printing audit: remove open_arg() function that is never used audit: correct AUDIT_GET_FEATURE return message type audit: set nlmsg_len for multicast messages. audit: use union for audit_field values since they are mutually exclusive audit: invalid op= values for rules audit: use atomic_t to simplify audit_serial() kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0] audit: reduce scope of audit_log_fcaps audit: reduce scope of audit_net_id audit: arm64: Remove the audit arch argument to audit_syscall_entry arm64: audit: Add audit hook in syscall_trace_enter/exit() audit: x86: drop arch from __audit_syscall_entry() interface sparc: implement is_32bit_task sparc: properly conditionalize use of TIF_32BIT ...
2014-10-17s390/uprobes: fix kprobes dependencyJan Willeke
If kprobes is disabled uprobes will not compile. Fix this by including the correct header files. Signed-off-by: Jan Willeke <willeke@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-17s390: wire up bpf syscallHeiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-15s390/mm: fixing calls of pte_unmap_unlockDominik Dingel
pte_unmap works on page table entry pointers, derefencing should be avoided. As on s390 pte_unmap is a NOP, this is more a cleanup if we want to supply later such function. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-15Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
2014-10-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "This patch set contains the main portion of the changes for 3.18 in regard to the s390 architecture. It is a bit bigger than usual, mainly because of a new driver and the vector extension patches. The interesting bits are: - Quite a bit of work on the tracing front. Uprobes is enabled and the ftrace code is reworked to get some of the lost performance back if CONFIG_FTRACE is enabled. - To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the IPTE range facility is added. - The rwlock code is re-factored to improve writer fairness and to be able to use the interlocked-access instructions. - The kernel part for the support of the vector extension is added. - The device driver to access the CD/DVD on the HMC is added, this will hopefully come in handy to improve the installation process. - Add support for control-unit initiated reconfiguration. - The crypto device driver is enhanced to enable the additional AP domains and to allow the new crypto hardware to be used. - Bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits) s390/ftrace: simplify enabling/disabling of ftrace_graph_caller s390/ftrace: remove 31 bit ftrace support s390/kdump: add support for vector extension s390/disassembler: add vector instructions s390: add support for vector extension s390/zcrypt: Toleration of new crypto hardware s390/idle: consolidate idle functions and definitions s390/nohz: use a per-cpu flag for arch_needs_cpu s390/vtime: do not reset idle data on CPU hotplug s390/dasd: add support for control unit initiated reconfiguration s390/dasd: fix infinite loop during format s390/mm: make use of ipte range facility s390/setup: correct 4-level kernel page table detection s390/topology: call set_sched_topology early s390/uprobes: architecture backend for uprobes s390/uprobes: common library for kprobes and uprobes s390/rwlock: use the interlocked-access facility 1 instructions s390/rwlock: improve writer fairness s390/rwlock: remove interrupt-enabling rwlock variant. s390/mm: remove change bit override support ...
2014-10-14Merge branch 'x86-seccomp-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 seccomp changes from Ingo Molnar: "This tree includes x86 seccomp filter speedups and related preparatory work, which touches core seccomp facilities as well. The main idea is to split seccomp into two phases, to be able to enter a simple fast path for syscalls with ptrace side effects. There's no substantial user-visible (and ABI) effects expected from this, except a change in how we emit a better audit record for SECCOMP_RET_TRACE events" * 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls x86: Split syscall_trace_enter into two phases x86, entry: Only call user_exit if TIF_NOHZ x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit seccomp: Document two-phase seccomp and arch-provided seccomp_data seccomp: Allow arch code to provide seccomp_data seccomp: Refactor the filter callback and the API seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
2014-10-13Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave Hansen) - Various sched/idle refinements for better idle handling (Nicolas Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot) - sched/numa updates and optimizations (Rik van Riel) - sysbench speedup (Vincent Guittot) - capacity calculation cleanups/refactoring (Vincent Guittot) - Various cleanups to thread group iteration (Oleg Nesterov) - Double-rq-lock removal optimization and various refactorings (Kirill Tkhai) - various sched/deadline fixes ... and lots of other changes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) sched/dl: Use dl_bw_of() under rcu_read_lock_sched() sched/fair: Delete resched_cpu() from idle_balance() sched, time: Fix build error with 64 bit cputime_t on 32 bit systems sched: Improve sysbench performance by fixing spurious active migration sched/x86: Fix up typo in topology detection x86, sched: Add new topology for multi-NUMA-node CPUs sched/rt: Use resched_curr() in task_tick_rt() sched: Use rq->rd in sched_setaffinity() under RCU read lock sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask' sched: Use dl_bw_of() under RCU read lock sched/fair: Remove duplicate code from can_migrate_task() sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW sched: print_rq(): Don't use tasklist_lock sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock() sched: Fix the task-group check in tg_has_rt_tasks() sched/fair: Leverage the idle state info when choosing the "idlest" cpu sched: Let the scheduler see CPU idle states sched/deadline: Fix inter- exclusive cpusets migrations sched/deadline: Clear dl_entity params when setscheduling to different class sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault() ...
2014-10-10Merge branch 'for-v3.18' of ↵Linus Torvalds
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull dma-mapping update from Marek Szyprowski: "Provide the dma write coherent api (available previously on ARM architecture) for all other architectures, which use dma_ops-based dma mapping implementation. This lets one to use the same code in the device drivers regardless of the selected architecture" * 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: dma-mapping: Provide write-combine allocations s390: Implement dma_{alloc,free}_attrs()
2014-10-09nosave: consolidate __nosave_{begin,end} in <asm/sections.h>Geert Uytterhoeven
The different architectures used their own (and different) declarations: extern __visible const void __nosave_begin, __nosave_end; extern const void __nosave_begin, __nosave_end; extern long __nosave_begin, __nosave_end; Consolidate them using the first variant in <asm/sections.h>. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Main changes: - Fix the deadlock reported by Dave Jones et al - Clean up and fix nohz_full interaction with arch abilities - nohz init code consolidation/cleanup" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: nohz full depends on irq work self IPI support nohz: Consolidate nohz full init code arm64: Tell irq work about self IPI support arm: Tell irq work about self IPI support x86: Tell irq work about self IPI support irq_work: Force raised irq work to run on irq work interrupt irq_work: Introduce arch_irq_work_has_interrupt() nohz: Move nohz full init call to tick init
2014-10-09s390/ftrace: simplify enabling/disabling of ftrace_graph_callerHeiko Carstens
We can simply patch the mask field within the branch relative on condition instruction at the beginning of the ftrace_graph_caller code block. This makes the logic even simpler and we get rid of the displacement calculation. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09s390/ftrace: remove 31 bit ftrace supportHeiko Carstens
31 bit and 64 bit diverge more and more and it is rather painful to keep both parts running. To make things simpler just remove the 31 bit support which nobody uses anyway. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09s390/kdump: add support for vector extensionMichael Holzheu
With this patch for kdump the s390 vector registers are stored into the prepared save areas in the old kernel and into the REGSET_VX_LOW and REGSET_VX_HIGH ELF notes for /proc/vmcore in the new kernel. The NT_S390_VXRS_LOW note contains the lower halves of the first 16 vector registers 0-15. The higher halves are stored in the floating point register ELF note. The NT_S390_VXRS_HIGH contains the full vector registers 16-31. The kernel provides a save area for storing vector register in case of machine checks. A pointer to this save are is stored in the CPU lowcore at offset 0x11b0. This save area is also used to save the registers for kdump. In case of a dumped crashed kdump those areas are used to extract the registers of the production system. The vector registers for remote CPUs are stored using the "store additional status at address" SIGP. For the dump CPU the vector registers are stored with the VSTM instruction. With this patch also zfcpdump stores the vector registers. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09s390/disassembler: add vector instructionsMartin Schwidefsky
Add the instruction introduced with the vector extension to the in-kernel disassembler. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09s390: add support for vector extensionMartin Schwidefsky
The vector extension introduces 32 128-bit vector registers and a set of instruction to operate on the vector registers. The kernel can control the use of vector registers for the problem state program with a bit in control register 0. Once enabled for a process the kernel needs to retain the content of the vector registers on context switch. The signal frame is extended to include the vector registers. Two new register sets NT_S390_VXRS_LOW and NT_S390_VXRS_HIGH are added to the regset interface for the debugger and core dumps. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>