summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)Author
2013-08-04s390: move dummy io_remap_pfn_range() to asm/pgtable.hLinus Torvalds
commit 4f2e29031e6c67802e7370292dd050fd62f337ee upstream. Commit b4cbb197c7e7 ("vm: add vm_iomap_memory() helper function") added a helper function wrapper around io_remap_pfn_range(), and every other architecture defined it in <asm/pgtable.h>. The s390 choice of <asm/io.h> may make sense, but is not very convenient for this case, and gratuitous differences like that cause unexpected errors like this: mm/memory.c: In function 'vm_iomap_memory': mm/memory.c:2439:2: error: implicit declaration of function 'io_remap_pfn_range' [-Werror=implicit-function-declaration] Glory be the kbuild test robot who noticed this, bisected it, and reported it to the guilty parties (ie me). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> [bwh: Backported to 3.2: the macro was not defined, so this is an addition and not a move] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05signal: Define __ARCH_HAS_SA_RESTORER so we know whether to clear sa_restorerBen Hutchings
Vaguely based on upstream commit 574c4866e33d 'consolidate kernel-side struct sigaction declarations'. flush_signal_handlers() needs to know whether sigaction::sa_restorer is defined, not whether SA_RESTORER is defined. Define the __ARCH_HAS_SA_RESTORER macro to indicate this. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20s390/mm: fix flush_tlb_kernel_range()Heiko Carstens
commit f6a70a07079518280022286a1dceb797d12e1edf upstream. Our flush_tlb_kernel_range() implementation calls __tlb_flush_mm() with &init_mm as argument. __tlb_flush_mm() however will only flush tlbs for the passed in mm if its mm_cpumask is not empty. For the init_mm however its mm_cpumask has never any bits set. Which in turn means that our flush_tlb_kernel_range() implementation doesn't work at all. This can be easily verified with a vmalloc/vfree loop which allocates a page, writes to it and then frees the page again. A crash will follow almost instantly. To fix this remove the cpumask_empty() check in __tlb_flush_mm() since there shouldn't be too many mms with a zero mm_cpumask, besides the init_mm of course. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20s390: critical section cleanup vs. machine checksMartin Schwidefsky
commit 6551fbdfd8b85d1ab5822ac98abb4fb449bcfae0 upstream. The current machine check code uses the registers stored by the machine in the lowcore at __LC_GPREGS_SAVE_AREA as the registers of the interrupted context. The registers 0-7 of a user process can get clobbered if a machine checks interrupts the execution of a critical section in entry[64].S. The reason is that the critical section cleanup code may need to modify the PSW and the registers for the previous context to get to the end of a critical section. If registers 0-7 have to be replaced the relevant copy will be in the registers, which invalidates the copy in the lowcore. The machine check handler needs to explicitly store registers 0-7 to the stack. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-28s390/kvm: Fix store status for ACRS/FPRSChristian Borntraeger
commit 15bc8d8457875f495c59d933b05770ba88d1eacb upstream. On store status we need to copy the current state of registers into a save area. Currently we might save stale versions: The sie state descriptor doesnt have fields for guest ACRS,FPRS, those registers are simply stored in the host registers. The host program must copy these away if needed. We do that in vcpu_put/load. If we now do a store status in KVM code between vcpu_put/load, the saved values are not up-to-date. Lets collect the ACRS/FPRS before saving them. This also fixes some strange problems with hotplug and virtio-ccw, since the low level machine check handler (on hotplug a machine check will happen) will revalidate all registers with the content of the save area. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-17s390/timer: avoid overflow when programming clock comparatorHeiko Carstens
commit d911e03d097bdc01363df5d81c43f69432eb785c upstream. Since ed4f209 "s390/time: fix sched_clock() overflow" a new helper function is used to avoid overflows when converting TOD format values to nanosecond values. The kvm interrupt code formerly however only worked by accident because of an overflow. It tried to program a timer that would expire in more than ~29 years. Because of the old TOD-to-nanoseconds overflow bug the real expiry value however was much smaller, but now it isn't anymore. This however triggers yet another bug in the function that programs the clock comparator s390_next_ktime(): if the absolute "expires" value is after 2042 this will result in an overflow and the programmed value is lower than the current TOD value which immediatly triggers a clock comparator (= timer) interrupt. Since the timer isn't expired it will be programmed immediately again and so on... the result is a dead system. To fix this simply program the maximum possible value if an overflow is detected. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21s390/time: fix sched_clock() overflowHeiko Carstens
commit ed4f20943cd4c7b55105c04daedf8d63ab6d499c upstream. Converting a 64 Bit TOD format value to nanoseconds means that the value must be divided by 4.096. In order to achieve that we multiply with 125 and divide by 512. When used within sched_clock() this triggers an overflow after appr. 417 days. Resulting in a sched_clock() return value that is much smaller than previously and therefore may cause all sort of weird things in subsystems that rely on a monotonic sched_clock() behaviour. To fix this implement a tod_to_ns() helper function which converts TOD values without overflow and call this function from both places that open coded the conversion: sched_clock() and kvm_s390_handle_wait(). Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26s390/signal: set correct address space controlMartin Schwidefsky
commit fa968ee215c0ca91e4a9c3a69ac2405aae6e5d2f upstream. If user space is running in primary mode it can switch to secondary or access register mode, this is used e.g. in the clock_gettime code of the vdso. If a signal is delivered to the user space process while it has been running in access register mode the signal handler is executed in access register mode as well which will result in a crash most of the time. Set the address space control bits in the PSW to the default for the execution of the signal handler and make sure that the previous address space control is restored on signal return. Take care that user space can not switch to the kernel address space by modifying the registers in the signal frame. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26s390/gup: add missing TASK_SIZE check to get_user_pages_fast()Heiko Carstens
commit d55c4c613fc4d4ad2ba0fc6fa2b57176d420f7e4 upstream. When walking page tables we need to make sure that everything is within bounds of the ASCE limit of the task's address space. Otherwise we might calculate e.g. a pud pointer which is not within a pud and dereference it. So check against TASK_SIZE (which is the ASCE limit) before walking page tables. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28s390: fix linker script for 31 bit buildsHeiko Carstens
commit c985cb37f1b39c2c8035af741a2a0b79f1fbaca7 upstream. Because of a change in the s390 arch backend of binutils (commit 23ecd77 "Pick the default arch depending on the target size" in binutils repo) 31 bit builds will fail since the linker would now try to create 64 bit binary output. Fix this by setting OUTPUT_ARCH to s390:31-bit instead of s390. Thanks to Andreas Krebbel for figuring out the issue. Fixes this build error: LD init/built-in.o s390x-4.7.2-ld: s390:31-bit architecture of input file `arch/s390/kernel/head.o' is incompatible with s390:64-bit output Cc: Andreas Krebbel <Andreas.Krebbel@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02oprofile, s390: Fix uninitialized memory access when writing to oprofilefsRobert Richter
commit 81ff3478d9ba7f0b48b0abef740e542fd83adf79 upstream. If oprofilefs_ulong_from_user() is called with count equals zero, *val remains unchanged. Depending on the implementation it might be uninitialized. Fixing users of oprofilefs_ulong_ from_user(). We missed these s390 changes with: 913050b oprofile: Fix uninitialized memory access when writing to writing to oprofilefs Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26s390/compat: fix mmap compat system callsHeiko Carstens
commit e85871218513c54f7dfdb6009043cb638f2fecbe upstream. The native 31 bit and the compat behaviour for the mmap system calls differ: In native 31 bit mode the passed in address for the mmap system call will be unmodified passed to sys_mmap_pgoff(). In compat mode however the passed in address will be modified with compat_ptr() which masks out the most significant bit. The result is that in native 31 bit mode each mmap request (with MAP_FIXED) will fail where the most significat bit is set, while in compat mode it may succeed. This odd behaviour was introduced with d3815898 "[S390] mmap: add missing compat_ptr conversion to both mmap compat syscalls". To restore a consistent behaviour accross native and compat mode this patch functionally reverts the above mentioned commit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26s390/compat: fix compat wrappers for process_vm system callsHeiko Carstens
commit 82aabdb6f1eb61e0034ec23901480f5dd23db7c4 upstream. The compat wrappers incorrectly called the non compat versions of the system process_vm system calls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09s390/mm: fix fault handling for page table walk caseHeiko Carstens
commit 008c2e8f247f0a8db1e8e26139da12f3a3abcda0 upstream. Make sure the kernel does not incorrectly create a SIGBUS signal during user space accesses: For user space accesses in the switched addressing mode case the kernel may walk page tables and access user address space via the kernel mapping. If a page table entry is invalid the function __handle_fault() gets called in order to emulate a page fault and trigger all the usual actions like paging in a missing page etc. by calling handle_mm_fault(). If handle_mm_fault() returns with an error fixup handling is necessary. For the switched addressing mode case all errors need to be mapped to -EFAULT, so that the calling uaccess function can return -EFAULT to user space. Unfortunately the __handle_fault() incorrectly calls do_sigbus() if VM_FAULT_SIGBUS is set. This however should only happen if a page fault was triggered by a user space instruction. For kernel mode uaccesses the correct action is to only return -EFAULT. So user space may incorrectly see SIGBUS signals because of this bug. For current machines this would only be possible for the switched addressing mode case in conjunction with futex operations. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09s390/mm: downgrade page table after fork of a 31 bit processMartin Schwidefsky
commit 0f6f281b731d20bfe75c13f85d33f3f05b440222 upstream. The downgrade of the 4 level page table created by init_new_context is currently done only in start_thread31. If a 31 bit process forks the new mm uses a 4 level page table, including the task size of 2<<42 that goes along with it. This is incorrect as now a 31 bit process can map memory beyond 2GB. Define arch_dup_mmap to do the downgrade after fork. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09s390/idle: fix sequence handling vs cpu hotplugHeiko Carstens
commit 0008204ffe85d23382d6fd0f971f3f0fbe70bae2 upstream. The s390 idle accounting code uses a sequence counter which gets used when the per cpu idle statistics get updated and read. One assumption on read access is that only when the sequence counter is even and did not change while reading all values the result is valid. On cpu hotplug however the per cpu data structure gets initialized via a cpu hotplug notifier on CPU_ONLINE. CPU_ONLINE however is too late, since the onlined cpu is already running and might access the per cpu data. Worst case is that the data structure gets initialized while an idle thread is updating its idle statistics. This will result in an uneven sequence counter after an update. As a result user space tools like top, which access /proc/stat in order to get idle stats, will busy loop waiting for the sequence counter to become even again, which will never happen until the queried cpu will update its idle statistics again. And even then the sequence counter will only have an even value for a couple of cpu cycles. Fix this by moving the initialization of the per cpu idle statistics to cpu_init(). I prefer that solution in favor of changing the notifier to CPU_UP_PREPARE, which would be a different solution to the problem. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-01s390/pfault: fix task state raceHeiko Carstens
commit d5e50a51ccbda36b379aba9d1131a852eb908dda upstream. When setting the current task state to TASK_UNINTERRUPTIBLE this can race with a different cpu. The other cpu could set the task state after it inspected it (while it was still TASK_RUNNING) to TASK_RUNNING which would change the state from TASK_UNINTERRUPTIBLE to TASK_RUNNING again. This race was always present in the pfault interrupt code but didn't cause anything harmful before commit f2db2e6c "[S390] pfault: cpu hotplug vs missing completion interrupts" which relied on the fact that after setting the task state to TASK_UNINTERRUPTIBLE the task would really sleep. Since this is not necessarily the case the result may be a list corruption of the pfault_list or, as observed, a use-after-free bug while trying to access the task_struct of a task which terminated itself already. To fix this, we need to get a reference of the affected task when receiving the initial pfault interrupt and add special handling if we receive yet another initial pfault interrupt when the task is already enqueued in the pfault list. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-11[S390] Fix compile error in swab.hMartin Schwidefsky
The inline assembly in__arch_swab16p causes compile errors of the form: *error*: *invalid* '*asm*': operand number missing after %-*letter* The assembly uses the %O<n>/%R<n> notation but the first operand misses the operand number, it needs to be "%O1" instead of "%O". Reported-by: Gil Peleg <gilpeleg@servframe.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11[S390] Fix stfle() lowcore protection problemMichael Holzheu
The stfle() function writes into lowcore memory when stfl_fac_list is initialized with "S390_lowcore.stfl_fac_list = 0". For older compilers this triggers a lowcore exception. With newer compilers and "-OXX" compile option the bug does not show up because the "S390_lowcore.stfl_fac_list" initialization is removed by the compiler. The reason for thatis the incorrect "=m" (S390_lowcore.stfl_fac_list) constraint in the stfl inline assembly. The following shows the disassembly of the stfle() optimized code that is inlined in the lgr_info_get() function: 000000000011325c <lgr_info_get>: 11325c: eb 9f f0 60 00 24 stmg %r9,%r15,96(%r15) 113262: c0 d0 00 29 0e 47 larl %r13,634ef0 <servi..> 113268: a7 f1 3f c0 tml %r15,16320 11326c: b9 04 00 ef lgr %r14,%r15 113270: a7 84 00 01 je 113272 <lgr_info_g..> 113274: a7 fb ff c0 aghi %r15,-64 113278: b9 04 00 c2 lgr %r12,%r2 11327c: a7 29 00 01 lghi %r2,1 113280: e3 e0 f0 98 00 24 stg %r14,152(%r15) 113286: d7 97 c0 00 c0 00 xc 0(152,%r12),0(%r12) 11328c: c0 e5 00 28 db 4c brasl %r14,62e924 <add_e..> 113292: b2 b1 00 00 stfl 0 To fix the problem we now clear the S390_lowcore.stfl_fac_list at startup in "head.S" for all machine types before lowcore protection is enabled. In addition to that the "=m" constraint is replaced by "+m". Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11[S390] cpum_cf: get rid of compile warningsHeiko Carstens
Fix these: arch/s390/kernel/perf_cpum_cf.c:180:3: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'int' [-Wformat] arch/s390/kernel/perf_cpum_cf.c: In function 'cpumf_pmu_disable': arch/s390/kernel/perf_cpum_cf.c:205:3: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'int' [-Wformat] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11[S390] irq: simple coding style changeHeiko Carstens
Use braces for if/else/list_for_each_entry bodies if the body consists of more than a single line. Otherwise I get confused and check if there is something broken whenever I see these code snippets. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11[S390] update default configurationMichael Holzheu
Add TASKSTATS options, enable CRASH_DUMP, and regenerate defconfig file with "make savedefconfig". Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11[S390] fix tlb flushing for page table pagesMartin Schwidefsky
Git commit 36409f6353fc2d7b6516e631415f938eadd92ffa "use generic RCU page-table freeing code" introduced a tlb flushing bug. Partially revert the above git commit and go back to s390 specific page table flush code. For s390 the TLB can contain three types of entries, "normal" TLB page-table entries, TLB combined region-and-segment-table (CRST) entries and real-space entries. Linux does not use real-space entries which leaves normal TLB entries and CRST entries. The CRST entries are intermediate steps in the page-table translation called translation paths. For example a 4K page access in a three-level page table setup will create two CRST TLB entries and one page-table TLB entry. The advantage of that approach is that a page access next to the previous one can reuse the CRST entries and needs just a single read from memory to create the page-table TLB entry. The disadvantage is that the TLB flushing rules are more complicated, before any page-table may be freed the TLB needs to be flushed. In short: the generic RCU page-table freeing code is incorrect for the CRST entries, in particular the check for mm_users < 2 is troublesome. This is applicable to 3.0+ kernels. Cc: <stable@vger.kernel.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11[S390] kernel: Use local_irq_save() for memcpy_real()Michael Holzheu
Currently in the memcpy_real() function interrupts are disabled with __arch_local_irq_stnsm(). In order to notify lockdep that interrupts are disabled, with this patch local_irq_save() is used instead. The function __arch_local_irq_stnsm() is still used for switching to real mode. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-30[S390] Fix build errors (fallout from system.h disintegration)Heiko Carstens
Signed-off-by: Heiko Carstens <h.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-29Merge branch 'x86-x32-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x32 support for x86-64 from Ingo Molnar: "This tree introduces the X32 binary format and execution mode for x86: 32-bit data space binaries using 64-bit instructions and 64-bit kernel syscalls. This allows applications whose working set fits into a 32 bits address space to make use of 64-bit instructions while using a 32-bit address space with shorter pointers, more compressed data structures, etc." Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c} * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) x32: Fix alignment fail in struct compat_siginfo x32: Fix stupid ia32/x32 inversion in the siginfo format x32: Add ptrace for x32 x32: Switch to a 64-bit clock_t x32: Provide separate is_ia32_task() and is_x32_task() predicates x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls x86/x32: Fix the binutils auto-detect x32: Warn and disable rather than error if binutils too old x32: Only clear TIF_X32 flag once x32: Make sure TS_COMPAT is cleared for x32 tasks fs: Remove missed ->fds_bits from cessation use of fd_set structs internally fs: Fix close_on_exec pointer in alloc_fdtable x32: Drop non-__vdso weak symbols from the x32 VDSO x32: Fix coding style violations in the x32 VDSO code x32: Add x32 VDSO support x32: Allow x32 to be configured x32: If configured, add x32 system calls to system call tables x32: Handle process creation x32: Signal-related system calls x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h> ...
2012-03-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds
Pull arch/tile (really asm-generic) update from Chris Metcalf: "These are a couple of asm-generic changes that apply to tile." * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: compat: use sys_sendfile64() implementation for sendfile syscall [PATCH v3] ipc: provide generic compat versions of IPC syscalls
2012-03-28Merge tag 'split-asm_system_h-for-linus-20120328' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
2012-03-28Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Avi Kivity: "Changes include timekeeping improvements, support for assigning host PCI devices that share interrupt lines, s390 user-controlled guests, a large ppc update, and random fixes." This is with the sign-off's fixed, hopefully next merge window we won't have rebased commits. * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: Convert intx_mask_lock to spin lock KVM: x86: fix kvm_write_tsc() TSC matching thinko x86: kvmclock: abstract save/restore sched_clock_state KVM: nVMX: Fix erroneous exception bitmap check KVM: Ignore the writes to MSR_K7_HWCR(3) KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask KVM: PMU: add proper support for fixed counter 2 KVM: PMU: Fix raw event check KVM: PMU: warn when pin control is set in eventsel msr KVM: VMX: Fix delayed load of shared MSRs KVM: use correct tlbs dirty type in cmpxchg KVM: Allow host IRQ sharing for assigned PCI 2.3 devices KVM: Ensure all vcpus are consistent with in-kernel irqchip settings KVM: x86 emulator: Allow PM/VM86 switch during task switch KVM: SVM: Fix CPL updates KVM: x86 emulator: VM86 segments must have DPL 3 KVM: x86 emulator: Fix task switch privilege checks arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation KVM: mmu_notifier: Flush TLBs before releasing mmu_lock ...
2012-03-28Delete all instances of asm/system.hDavid Howells
Delete all instances of asm/system.h as they should be redundant by this point. Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28Disintegrate asm/system.h for S390David Howells
Disintegrate asm/system.h for S390. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-s390@vger.kernel.org
2012-03-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 patches part 2 from Martin Schwidefsky: "Some minor improvements and one additional feature for the 3.4 merge window: Hendrik added perf support for the s390 CPU counters." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: [S390] register cpu devices for SMP=n [S390] perf: add support for s390x CPU counters [S390] oprofile: Allow multiple users of the measurement alert interrupt [S390] qdio: log all adapter characteristics [S390] Remove unncessary export of arch_pick_mmap_layout
2012-03-23coredump: remove VM_ALWAYSDUMP flagJason Baron
The motivation for this patchset was that I was looking at a way for a qemu-kvm process, to exclude the guest memory from its core dump, which can be quite large. There are already a number of filter flags in /proc/<pid>/coredump_filter, however, these allow one to specify 'types' of kernel memory, not specific address ranges (which is needed in this case). Since there are no more vma flags available, the first patch eliminates the need for the 'VM_ALWAYSDUMP' flag. The flag is used internally by the kernel to mark vdso and vsyscall pages. However, it is simple enough to check if a vma covers a vdso or vsyscall page without the need for this flag. The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new 'VM_NODUMP' flag, which can be set by userspace using new madvise flags: 'MADV_DONTDUMP', and unset via 'MADV_DODUMP'. The core dump filters continue to work the same as before unless 'MADV_DONTDUMP' is set on the region. The qemu code which implements this features is at: http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch In my testing the qemu core dump shrunk from 383MB -> 13MB with this patch. I also believe that the 'MADV_DONTDUMP' flag might be useful for security sensitive apps, which might want to select which areas are dumped. This patch: The VM_ALWAYSDUMP flag is currently used by the coredump code to indicate that a vma is part of a vsyscall or vdso section. However, we can determine if a vma is in one these sections by checking it against the gate_vma and checking for a non-NULL return value from arch_vma_name(). Thus, freeing a valuable vma bit. Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23[S390] register cpu devices for SMP=nMartin Schwidefsky
A kernel compiled with SMP=n will not register any cpu devices. The resulting kernel image will not boot with this error message: kernel BUG at fs/sysfs/group.c:65! Use GENERIC_CPU_DEVICES=y if SMP=n to get the missing cpu device. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-23[S390] perf: add support for s390x CPU countersHendrik Brueckner
Add a perf PMU to access the CPU-measurement counter facility CPUM CF. CPUM CF provides multiple counter sets for measuring generic, problem-state, and crypto activaties. Also an extended counter set for the IBM System z10 and IBM z196 mainframes is available. Counters from the basic and problem-state counter set are mapped to generic perf hardware events. Other counters are accessible through raw events. For a list of available counter sets and counters, see: - The Load-Program-Parameter and the CPU-Measurement Facilities (SA23-2260) - The CPU-Measurement Facility Extended Counters Definition for z10 and z196 (SA23-2261) Reviewed-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-23[S390] oprofile: Allow multiple users of the measurement alert interruptJan Glauber
Prepare the measurement facility which is currently only used by oprofile for multiple users. To achieve that the measurement alert interrupt control bit needs to be protected. The measurement alert definitions are moved to a header file and an interrupt mask is added so that users can discard interrupts if they are for a different measurement subsystem. Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-23[S390] Remove unncessary export of arch_pick_mmap_layoutBen Hutchings
This function is defined for use in exec, not in modules. No other architecture exports its implementation. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 patches from Martin Schwidefsky: "The biggest patch is the rework of the smp code, something I wanted to do for some time. There are some patches for our various dump methods and one new thing: z/VM LGR detection. LGR stands for linux-guest- relocation and is the guest migration feature of z/VM. For debugging purposes we keep a log of the systems where a specific guest has lived." Fix up trivial conflict in arch/s390/kernel/smp.c due to the scheduler cleanup having removed some code next to removed s390 code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: [S390] kernel: Pass correct stack for smp_call_ipl_cpu() [S390] Ensure that vmcore_info pointer is never accessed directly [S390] dasd: prevent validate server for offline devices [S390] Remove monolithic build option for zcrypt driver. [S390] stack dump: fix indentation in output [S390] kernel: Add OS info memory interface [S390] Use block_sigmask() [S390] kernel: Add z/VM LGR detection [S390] irq: external interrupt code passing [S390] irq: set __ARCH_IRQ_EXIT_IRQS_DISABLED [S390] zfcpdump: Implement async sdias event processing [S390] Use copy_to_absolute_zero() instead of "stura/sturg" [S390] rework idle code [S390] rework smp code [S390] rename lowcore field [S390] Fix gcc 4.6.0 compile warning
2012-03-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "This is _not_ all; in particular, Miklos' and Jan's stuff is not there yet." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits) ext4: initialization of ext4_li_mtx needs to be done earlier debugfs-related mode_t whack-a-mole hfsplus: add an ioctl to bless files hfsplus: change finder_info to u32 hfsplus: initialise userflags qnx4: new helper - try_extent() qnx4: get rid of qnx4_bread/qnx4_getblk take removal of PF_FORKNOEXEC to flush_old_exec() trim includes in inode.c um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it um: embed ->stub_pages[] into mmu_context gadgetfs: list_for_each_safe() misuse ocfs2: fix leaks on failure exits in module_init ecryptfs: make register_filesystem() the last potential failure exit ntfs: forgets to unregister sysctls on register_filesystem() failure logfs: missing cleanup on register_filesystem() failure jfs: mising cleanup on register_filesystem() failure make configfs_pin_fs() return root dentry on success configfs: configfs_create_dir() has parent dentry in dentry->d_parent configfs: sanitize configfs_create() ...
2012-03-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking merge from David Miller: "1) Move ixgbe driver over to purely page based buffering on receive. From Alexander Duyck. 2) Add receive packet steering support to e1000e, from Bruce Allan. 3) Convert TCP MD5 support over to RCU, from Eric Dumazet. 4) Reduce cpu usage in handling out-of-order TCP packets on modern systems, also from Eric Dumazet. 5) Support the IP{,V6}_UNICAST_IF socket options, making the wine folks happy, from Erich Hoover. 6) Support VLAN trunking from guests in hyperv driver, from Haiyang Zhang. 7) Support byte-queue-limtis in r8169, from Igor Maravic. 8) Outline code intended for IP_RECVTOS in IP_PKTOPTIONS existed but was never properly implemented, Jiri Benc fixed that. 9) 64-bit statistics support in r8169 and 8139too, from Junchang Wang. 10) Support kernel side dump filtering by ctmark in netfilter ctnetlink, from Pablo Neira Ayuso. 11) Support byte-queue-limits in gianfar driver, from Paul Gortmaker. 12) Add new peek socket options to assist with socket migration, from Pavel Emelyanov. 13) Add sch_plug packet scheduler whose queue is controlled by userland daemons using explicit freeze and release commands. From Shriram Rajagopalan. 14) Fix FCOE checksum offload handling on transmit, from Yi Zou." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1846 commits) Fix pppol2tp getsockname() Remove printk from rds_sendmsg ipv6: fix incorrent ipv6 ipsec packet fragment cpsw: Hook up default ndo_change_mtu. net: qmi_wwan: fix build error due to cdc-wdm dependecy netdev: driver: ethernet: Add TI CPSW driver netdev: driver: ethernet: add cpsw address lookup engine support phy: add am79c874 PHY support mlx4_core: fix race on comm channel bonding: send igmp report for its master fs_enet: Add MPC5125 FEC support and PHY interface selection net: bpf_jit: fix BPF_S_LDX_B_MSH compilation net: update the usage of CHECKSUM_UNNECESSARY fcoe: use CHECKSUM_UNNECESSARY instead of CHECKSUM_PARTIAL on tx net: do not do gso for CHECKSUM_UNNECESSARY in netif_needs_gso ixgbe: Fix issues with SR-IOV loopback when flow control is disabled net/hyperv: Fix the code handling tx busy ixgbe: fix namespace issues when FCoE/DCB is not enabled rtlwifi: Remove unused ETH_ADDR_LEN defines igbvf: Use ETH_ALEN ... Fix up fairly trivial conflicts in drivers/isdn/gigaset/interface.c and drivers/net/usb/{Kconfig,qmi_wwan.c} as per David.
2012-03-20switch open-coded instances of d_make_root() to new helperAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes for v3.4 from Ingo Molnar * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) printk: Make it compile with !CONFIG_PRINTK sched/x86: Fix overflow in cyc2ns_offset sched: Fix nohz load accounting -- again! sched: Update yield() docs printk/sched: Introduce special printk_sched() for those awkward moments sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer sched: Cleanup cpu_active madness sched: Fix load-balance wreckage sched: Clean up parameter passing of proc_sched_autogroup_set_nice() sched: Ditch per cgroup task lists for load-balancing sched: Rename load-balancing fields sched: Move load-balancing arguments into helper struct sched/rt: Do not submit new work when PI-blocked sched/rt: Prevent idle task boosting sched/wait: Add __wake_up_all_locked() API sched/rt: Document scheduler related skip-resched-check sites sched/rt: Use schedule_preempt_disabled() sched/rt: Add schedule_preempt_disabled() sched/rt: Do not throttle when PI boosting sched/rt: Keep period timer ticking when rt throttling is active ...
2012-03-20Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events changes for v3.4 from Ingo Molnar: - New "hardware based branch profiling" feature both on the kernel and the tooling side, on CPUs that support it. (modern x86 Intel CPUs with the 'LBR' hardware feature currently.) This new feature is basically a sophisticated 'magnifying glass' for branch execution - something that is pretty difficult to extract from regular, function histogram centric profiles. The simplest mode is activated via 'perf record -b', and the result looks like this in perf report: $ perf record -b any_call,u -e cycles:u branchy $ perf report -b --sort=symbol 52.34% [.] main [.] f1 24.04% [.] f1 [.] f3 23.60% [.] f1 [.] f2 0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow 0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn 0.01% [k] _IO_vfprintf_internal [k] strchrnul 0.01% [k] __printf [k] _IO_vfprintf_internal 0.01% [k] main [k] __printf This output shows from/to branch columns and shows the highest percentage (from,to) jump combinations - i.e. the most likely taken branches in the system. "branches" can also include function calls and any other synchronous and asynchronous transitions of the instruction pointer that are not 'next instruction' - such as system calls, traps, interrupts, etc. This feature comes with (hopefully intuitive) flat ascii and TUI support in perf report. - Various 'perf annotate' visual improvements for us assembly junkies. It will now recognize function calls in the TUI and by hitting enter you can follow the call (recursively) and back, amongst other improvements. - Multiple threads/processes recording support in perf record, perf stat, perf top - which is activated via a comma-list of PIDs: perf top -p 21483,21485 perf stat -p 21483,21485 -ddd perf record -p 21483,21485 - Support for per UID views, via the --uid paramter to perf top, perf report, etc. For example 'perf top --uid mingo' will only show the tasks that I am running, excluding other users, root, etc. - Jump label restructurings and improvements - this includes the factoring out of the (hopefully much clearer) include/linux/static_key.h generic facility: struct static_key key = STATIC_KEY_INIT_FALSE; ... if (static_key_false(&key)) do unlikely code else do likely code ... static_key_slow_inc(); ... static_key_slow_inc(); ... The static_key_false() branch will be generated into the code with as little impact to the likely code path as possible. the static_key_slow_*() APIs flip the branch via live kernel code patching. This facility can now be used more widely within the kernel to micro-optimize hot branches whose likelihood matches the static-key usage and fast/slow cost patterns. - SW function tracer improvements: perf support and filtering support. - Various hardenings of the perf.data ABI, to make older perf.data's smoother on newer tool versions, to make new features integrate more smoothly, to support cross-endian recording/analyzing workflows better, etc. - Restructuring of the kprobes code, the splitting out of 'optprobes', and a corner case bugfix. - Allow the tracing of kernel console output (printk). - Improvements/fixes to user-space RDPMC support, allowing user-space self-profiling code to extract PMU counts without performing any system calls, while playing nice with the kernel side. - 'perf bench' improvements - ... and lots of internal restructurings, cleanups and fixes that made these features possible. And, as usual this list is incomplete as there were also lots of other improvements * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits) perf report: Fix annotate double quit issue in branch view mode perf report: Remove duplicate annotate choice in branch view mode perf/x86: Prettify pmu config literals perf report: Enable TUI in branch view mode perf report: Auto-detect branch stack sampling mode perf record: Add HEADER_BRANCH_STACK tag perf record: Provide default branch stack sampling mode option perf tools: Make perf able to read files from older ABIs perf tools: Fix ABI compatibility bug in print_event_desc() perf tools: Enable reading of perf.data files from different ABI rev perf: Add ABI reference sizes perf report: Add support for taken branch sampling perf record: Add support for sampling taken branch perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK x86/kprobes: Split out optprobe related code to kprobes-opt.c x86/kprobes: Fix a bug which can modify kernel code permanently x86/kprobes: Fix instruction recovery on optimized path perf: Add callback to flush branch_stack on context switch perf: Disable PERF_SAMPLE_BRANCH_* when not supported perf/x86: Add LBR software filter support for Intel CPUs ...
2012-03-20Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes for v3.4 from Ingo Molnar. The major features of this series are: - making RCU more aggressive about entering dyntick-idle mode in order to improve energy efficiency - converting a few more call_rcu()s to kfree_rcu()s - applying a number of rcutree fixes and cleanups to rcutiny - removing CONFIG_SMP #ifdefs from treercu - allowing RCU CPU stall times to be set via sysfs - adding CPU-stall capability to rcutorture - adding more RCU-abuse diagnostics - updating documentation - fixing yet more issues located by the still-ongoing top-to-bottom inspection of RCU, this time with a special focus on the CPU-hotplug code path. * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) rcu: Stop spurious warnings from synchronize_sched_expedited rcu: Hold off RCU_FAST_NO_HZ after timer posted rcu: Eliminate softirq-mediated RCU_FAST_NO_HZ idle-entry loop rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit() rcu: Remove redundant check for rcu_head misalignment PTR_ERR should be called before its argument is cleared. rcu: Convert WARN_ON_ONCE() in rcu_lock_acquire() to lockdep rcu: Trace only after NULL-pointer check rcu: Call out dangers of expedited RCU primitives rcu: Rework detection of use of RCU by offline CPUs lockdep: Add CPU-idle/offline warning to lockdep-RCU splat rcu: No interrupt disabling for rcu_prepare_for_idle() rcu: Move synchronize_sched_expedited() to rcutree.c rcu: Check for illegal use of RCU from offlined CPUs rcu: Update stall-warning documentation rcu: Add CPU-stall capability to rcutorture rcu: Make documentation give more realistic rcutorture duration rcutorture: Permit holding off CPU-hotplug operations during boot rcu: Print scheduling-clock information on RCU CPU stall-warning messages ...
2012-03-15[PATCH v3] ipc: provide generic compat versions of IPC syscallsChris Metcalf
When using the "compat" APIs, architectures will generally want to be able to make direct syscalls to msgsnd(), shmctl(), etc., and in the kernel we would want them to be handled directly by compat_sys_xxx() functions, as is true for other compat syscalls. However, for historical reasons, several of the existing compat IPC syscalls do not do this. semctl() expects a pointer to the fourth argument, instead of the fourth argument itself. msgsnd(), msgrcv() and shmat() expect arguments in different order. This change adds an ARCH_WANT_OLD_COMPAT_IPC config option that can be set to preserve this behavior for ports that use it (x86, sparc, powerpc, s390, and mips). No actual semantics are changed for those architectures, and there is only a minimal amount of code refactoring in ipc/compat.c. Newer architectures like tile (and perhaps future architectures such as arm64 and unicore64) should not select this option, and thus can avoid having any IPC-specific code at all in their architecture-specific compat layer. In the same vein, if this option is not selected, IPC_64 mode is assumed, since that's what the <asm-generic> headers expect. The workaround code in "tile" for msgsnd() and msgrcv() is removed with this change; it also fixes the bug that shmat() and semctl() were not being properly handled. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-03-13Merge tag 'v3.3-rc7' into sched/coreIngo Molnar
Merge reason: merge back final fixes, prepare for the merge window. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13[S390] kernel: Pass correct stack for smp_call_ipl_cpu()Michael Holzheu
Currently pcpu_devices->panic_stack is passed to pcpu_delegate() in smp_call_ipl_cpu(). This is wrong because pcpu_delegate() expects the bottom (high address) of the stack and pcpu_devices->panic_stack points to the top (low address). We now pass the bottom of the stack which is pcpu_devices->panic_stack + PAGE_SIZE. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-12Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge reason: We are going to queue up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12sched: Cleanup cpu_active madnessPeter Zijlstra
Stepan found: CPU0 CPUn _cpu_up() __cpu_up() boostrap() notify_cpu_starting() set_cpu_online() while (!cpu_active()) cpu_relax() <PREEMPT-out> smp_call_function(.wait=1) /* we find cpu_online() is true */ arch_send_call_function_ipi_mask() /* wait-forever-more */ <PREEMPT-in> local_irq_enable() cpu_notify(CPU_ONLINE) sched_cpu_active() set_cpu_active() Now the purpose of cpu_active is mostly with bringing down a cpu, where we mark it !active to avoid the load-balancer from moving tasks to it while we tear down the cpu. This is required because we only update the sched_domain tree after we brought the cpu-down. And this is needed so that some tasks can still run while we bring it down, we just don't want new tasks to appear. On cpu-up however the sched_domain tree doesn't yet include the new cpu, so its invisible to the load-balancer, regardless of the active state. So instead of setting the active state after we boot the new cpu (and consequently having to wait for it before enabling interrupts) set the cpu active before we set it online and avoid the whole mess. Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-11[S390] Ensure that vmcore_info pointer is never accessed directlyMichael Holzheu
Because the vmcore_info pointer is not 8 byte aligned it never should not be accessed directly. The reason is that the compiler assumes that 64 bit pointer are always double word aligned. To ensure save access, the vmcore_info type in struct lowcore is changed from u64 to an u8[8] array and a comment is added. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>