summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel/time.c
AgeCommit message (Collapse)Author
2012-01-12powerpc/time: Handle wrapping of decrementerAnton Blanchard
commit 37fb9a0231ee43d42d069863bdfd567fca2b61af upstream. When re-enabling interrupts we have code to handle edge sensitive decrementers by resetting the decrementer to 1 whenever it is negative. If interrupts were disabled long enough that the decrementer wrapped to positive we do nothing. This means interrupts can be delayed for a long time until it finally goes negative again. While we hope interrupts are never be disabled long enough for the decrementer to go positive, we have a very good test team that can drive any kernel into the ground. The softlockup data we get back from these fails could be seconds in the future, completely missing the cause of the lockup. We already keep track of the timebase of the next event so use that to work out if we should trigger a decrementer exception. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-01irq_work, ppc: Fix up arch hooksPeter Zijlstra
Commit e360adbe29 ("irq_work: Add generic hardirq context callbacks") fouled up the ppc bit, not properly naming the arch specific function that raises the 'self-IPI'. Cc: Huang Ying <ying.huang@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: stable@kernel.org # 37+ Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-eg0aqien8p1aqvzu9dft6dtv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-18powerpc: Fix oops if scan_dispatch_log is called too earlyAnton Blanchard
We currently enable interrupts before the dispatch log for the boot cpu is setup. If a timer interrupt comes in early enough we oops in scan_dispatch_log: Unable to handle kernel paging request for data at address 0x00000010 ... .scan_dispatch_log+0xb0/0x170 .account_system_vtime+0xa0/0x220 .irq_enter+0x88/0xc0 .do_IRQ+0x48/0x230 The patch below adds a check to scan_dispatch_log to ensure the dispatch log has been allocated. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-01powerpc: Make decrementer interrupt robust against offlined CPUsBenjamin Herrenschmidt
With some implementations, it is possible that a timer interrupt occurs every few seconds on an offline CPU. In this case, just re-arm the decrementer and return immediately Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-30powerpc: Fix accounting of softirq time when idleAnton Blanchard
commit cf9efce0ce31 (powerpc: Account time using timebase rather than PURR) used in_irq() to detect if the time was spent in interrupt processing. This only catches hardirq context so if we are in softirq context and in the idle loop we end up accounting it as idle time. If we instead use in_interrupt() we catch both softirq and hardirq time. The issue was found when running a network intensive workload. top showed the following: 0.0%us, 1.1%sy, 0.0%ni, 85.7%id, 0.0%wa, 9.9%hi, 3.3%si, 0.0%st 85.7% idle. But this was wildly different to the perf events data. To confirm the suspicion I ran something to keep the core busy: # yes > /dev/null & 8.2%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 10.3%hi, 81.4%si, 0.0%st We only got 8.2% of the CPU for the userspace task and softirq has shot up to 81.4%. With the patch below top shows the correct stats: 0.0%us, 0.0%sy, 0.0%ni, 5.3%id, 0.0%wa, 13.3%hi, 81.3%si, 0.0%st Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21powerpc/cell: Use system_wq in cpufreq_spudemandTejun Heo
With cmwq, there's no reason to use a separate workqueue in cpufreq_spudemand. Use system_wq instead. The work items are already sync canceled on stop, so it's already guaranteed that no work is running when spu_gov_exit() is entered. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linuxppc-dev@lists.ozlabs.org Cc: Dave Jones <davej@redhat.com> Cc: cpufreq@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-12-09powerpc/time: printk time stamp init not correctHeiko Schocher
problem: I see sometimes on my mpc5200 based board such printk timing information: [ 0.000000] NR_IRQS:512 nr_irqs:512 16 [ 0.000000] MPC52xx PIC is up and running! [ 0.000000] clocksource: timebase mult[79364d9] shift[22] registered [ 0.000000] console [ttyPSC0] enabled [ 130.300633] pid_max: default: 32768 minimum: 301 [ 130.305647] Mount-cache hash table entries: 512 [ 130.315818] NET: Registered protocol family 16 reason: if the tbu not starts from 0 when linux boots, boot_tb maybe could not store the real 64 bit tbu value, because boot_tp is only a 32 bit unsigned long. solution: change boot_tb to u64 [BenH: Made it u64 instead of unsigned long long] Signed-off-by: Heiko Schocher <hs@denx.de> cc: Wolfgang Denk <wd@denx.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-10-21Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (71 commits) powerpc/44x: Update ppc44x_defconfig powerpc/watchdog: Make default timeout for Book-E watchdog a Kconfig option fsl_rio: Add comments for sRIO registers. powerpc/fsl-booke: Add e55xx (64-bit) smp defconfig powerpc/fsl-booke: Add p5020 DS board support powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips powerpc/fsl-booke: Add support for FSL Arch v1.0 MMU in setup_page_sizes powerpc/fsl-booke: Add support for FSL 64-bit e5500 core powerpc/85xx: add cache-sram support powerpc/85xx: add ngPIXIS FPGA device tree node to the P1022DS board powerpc: Fix compile error with paca code on ppc64e powerpc/fsl-booke: Add p3041 DS board support oprofile/fsl emb: Don't set MSR[PMM] until after clearing the interrupt. powerpc/fsl-booke: Add PCI device ids for P2040/P3041/P5010/P5020 QoirQ chips powerpc/mpc8xxx_gpio: Add support for 'qoriq-gpio' controllers powerpc/fsl_booke: Add support to boot from core other than 0 powerpc/p1022: Add probing for individual DMA channels powerpc/fsl_soc: Search all global-utilities nodes for rstccr powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT powerpc/mpc83xx: Support for MPC8308 P1M board ... Fix up conflict with the generic irq_work changes in arch/powerpc/kernel/time.c
2010-10-18irq_work: Add generic hardirq context callbacksPeter Zijlstra
Provide a mechanism that allows running code in IRQ context. It is most useful for NMI code that needs to interact with the rest of the system -- like wakeup a task to drain buffers. Perf currently has such a mechanism, so extract that and provide it as a generic feature, independent of perf so that others may also benefit. The IRQ context callback is generated through self-IPIs where possible, or on architectures like powerpc the decrementer (the built-in timer facility) is set to generate an interrupt immediately. Architectures that don't have anything like this get to do with a callback from the timer tick. These architectures can call irq_work_run() at the tail of any IRQ handlers that might enqueue such work (like the perf IRQ handler) to avoid undue latencies in processing the work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-14powerpc: export ppc_proc_freq and ppc_tb_freq as GPL symbolsTimur Tabi
Export the global variable 'ppc_tb_freq', so that modules (like the Book-E watchdog driver) can use it. To maintain consistency, ppc_proc_freq is changed to a GPL-only export. This is okay, because any module that needs this symbol should be an actual Linux driver, which must be GPL-licensed. Signed-off-by: Timur Tabi <timur@freescale.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-09-02powerpc/pseries: Re-enable dispatch trace log userspace interfacePaul Mackerras
Since the cpu accounting code uses the hypervisor dispatch trace log now when CONFIG_VIRT_CPU_ACCOUNTING = y, the previous commit disabled access to it via files in the /sys/kernel/debug/powerpc/dtl/ directory in that case. This restores those files. To do this, we now have a hook that the cpu accounting code will call as it processes each entry from the hypervisor dispatch trace log. The code in dtl.c now uses that to fill up its ring buffer, rather than having the hypervisor fill the ring buffer directly. This also fixes dtl_file_read() to handle overflow conditions a bit better and adds a spinlock to ensure that race conditions (multiple processes opening or reading the file concurrently) are handled correctly. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02powerpc: Account time using timebase rather than PURRPaul Mackerras
Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the PURR register for measuring the user and system time used by processes, as well as other related times such as hardirq and softirq times. This turns out to be quite confusing for users because it means that a program will often be measured as taking less time when run on a multi-threaded processor (SMT2 or SMT4 mode) than it does when run on a single-threaded processor (ST mode), even though the program takes longer to finish. The discrepancy is accounted for as stolen time, which is also confusing, particularly when there are no other partitions running. This changes the accounting to use the timebase instead, meaning that the reported user and system times are the actual number of real-time seconds that the program was executing on the processor thread, regardless of which SMT mode the processor is in. Thus a program will generally show greater user and system times when run on a multi-threaded processor than on a single-threaded processor. On pSeries systems on POWER5 or later processors, we measure the stolen time (time when this partition wasn't running) using the hypervisor dispatch trace log. We check for new entries in the log on every entry from user mode and on every transition from kernel process context to soft or hard IRQ context (i.e. when account_system_vtime() gets called). So that we can correctly distinguish time stolen from user time and time stolen from system time, without having to check the log on every exit to user mode, we store separate timestamps for exit to user mode and entry from user mode. On systems that have a SPURR (POWER6 and POWER7), we read the SPURR in account_system_vtime() (as before), and then apportion the SPURR ticks since the last time we read it between scaled user time and scaled system time according to the relative proportions of user time and system time over the same interval. This avoids having to read the SPURR on every kernel entry and exit. On systems that have PURR but not SPURR (i.e., POWER5), we do the same using the PURR rather than the SPURR. This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl for now since it conflicts with the use of the dispatch trace log by the time accounting code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-31powerpc/perf_event: Reduce latency of calling perf_event_do_pendingPaul Mackerras
Commit 0fe1ac48 ("powerpc/perf_event: Fix oops due to perf_event_do_pending call") moved the call to perf_event_do_pending in timer_interrupt() down so that it was after the irq_enter() call. Unfortunately this moved it after the code that checks whether it is time for the next decrementer clock event. The result is that the call to perf_event_do_pending() won't happen until the next decrementer clock event is due. This was pointed out by Milton Miller. This fixes it by moving the check for whether it's time for the next decrementer clock event down to the point where we're about to call the event handler, after we've called perf_event_do_pending. This has the side effect that on old pre-Core99 Powermacs where we use the ppc_n_lost_interrupts mechanism to replay interrupts, a replayed interrupt will incur a little more latency since it will now do the code from the irq_enter down to the irq_exit, that it used to skip. However, these machines are now old and rare enough that this doesn't matter. To make it clear that ppc_n_lost_interrupts is only used on Powermacs, and to speed up the code slightly on non-Powermac ppc32 machines, the code that tests ppc_n_lost_interrupts is now conditional on CONFIG_PMAC as well as CONFIG_PPC32. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-28Merge branch 'powerpc.cherry-picks' into timers/clocksourceThomas Gleixner
Conflicts: arch/powerpc/kernel/time.c Reason: The powerpc next tree contains two commits which conflict with the timekeeping changes: 8fd63a9e powerpc: Rework VDSO gettimeofday to prevent time going backwards c1aa687d powerpc: Clean up obsolete code relating to decrementer and timebase John Stultz identified them and provided the conflict resolution. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-28powerpc: Clean up obsolete code relating to decrementer and timebasePaul Mackerras
Since the decrementer and timekeeping code was moved over to using the generic clockevents and timekeeping infrastructure, several variables and functions have been obsolete and effectively unused. This deletes them. In particular, wakeup_decrementer() is no longer needed since the generic code reprograms the decrementer as part of the process of resuming the timekeeping code, which happens during sysdev resume. Thus the wakeup_decrementer calls in the suspend_enter methods for 52xx platforms have been removed. The call in the powermac cpu frequency change code has been replaced by set_dec(1), which will cause a timer interrupt as soon as interrupts are enabled, and the generic code will then reprogram the decrementer with the correct value. This also simplifies the generic_suspend_en/disable_irqs functions and makes them static since they are not referenced outside time.c. The preempt_enable/disable calls are removed because the generic code has disabled all but the boot cpu at the point where these functions are called, so we can't be moved to another cpu. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-28powerpc: Rework VDSO gettimeofday to prevent time going backwardsPaul Mackerras
Currently it is possible for userspace to see the result of gettimeofday() going backwards by 1 microsecond, assuming that userspace is using the gettimeofday() in the VDSO. The VDSO gettimeofday() algorithm computes the time in "xsecs", which are units of 2^-20 seconds, or approximately 0.954 microseconds, using the algorithm now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec and then converts the time in xsecs to seconds and microseconds. The kernel updates the tb_orig_stamp and stamp_xsec values every tick in update_vsyscall(). If the length of the tick is not an integer number of xsecs, then some precision is lost in converting the current time to xsecs. For example, with CONFIG_HZ=1000, the tick is 1ms long, which is 1048.576 xsecs. That means that stamp_xsec will advance by either 1048 or 1049 on each tick. With the right conditions, it is possible for userspace to get (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is slightly late in updating the vdso_datapage, and then for stamp_xsec to advance by 1048 when the kernel does update it, and for userspace to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to integer truncation. The result is that time appears to go backwards by 1 microsecond. To fix this we change the VDSO gettimeofday to use a new field in the VDSO datapage which stores the nanoseconds part of the time as a fractional number of seconds in a 0.32 binary fraction format. (Or put another way, as a 32-bit number in units of 0.23283 ns.) This is convenient because we can use the mulhwu instruction to convert it to either microseconds or nanoseconds. Since it turns out that computing the time of day using this new field is simpler than either using stamp_xsec (as gettimeofday does) or stamp_xtime.tv_nsec (as clock_gettime does), this converts both gettimeofday and clock_gettime to use the new field. The existing __do_get_tspec function is converted to use the new field and take a parameter in r7 that indicates the desired resolution, 1,000,000 for microseconds or 1,000,000,000 for nanoseconds. The __do_get_xsec function is then unused and is deleted. The new algorithm is now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs + (stamp_xtime_seconds << 32) + stamp_sec_fraction with 'now' in units of 2^-32 seconds. That is then converted to seconds and either microseconds or nanoseconds with seconds = now >> 32 partseconds = ((now & 0xffffffff) * resolution) >> 32 The 32-bit VDSO code also makes a further simplification: it ignores the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary fraction. Doing so gets rid of 4 multiply instructions. Assuming a timebase frequency of 1GHz or less and an update interval of no more than 10ms, the upper 32 bits of tb_to_xs will be at least 4503599, so the error from ignoring the low 32 bits will be at most 2.2ns, which is more than an order of magnitude less than the time taken to do gettimeofday or clock_gettime on our fastest processors, so there is no possibility of seeing inconsistent values due to this. This also moves update_gtod() down next to its only caller, and makes update_vsyscall use the time passed in via the wall_time argument rather than accessing xtime directly. At present, wall_time always points to xtime, but that could change in future. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-27timkeeping: Fix update_vsyscall to provide wall_to_monotonic offsetJohn Stultz
update_vsyscall() did not provide the wall_to_monotoinc offset, so arch specific implementations tend to reference wall_to_monotonic directly. This limits future cleanups in the timekeeping core, so this patch fixes the update_vsyscall interface to provide wall_to_monotonic, allowing wall_to_monotonic to be made static as planned in Documentation/feature-removal-schedule.txt Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27powerpc: Cleanup xtime usageJohn Stultz
This removes powerpc's direct xtime usage, allowing for further generic timeekeping cleanups Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> LKML-Reference: <1279068988-21864-6-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27powerpc: Simplify update_vsyscallJohn Stultz
Currently powerpc's update_vsyscall calls an inline update_gtod. However, both are straightforward, and there are no other users, so this patch merges update_gtod into update_vsyscall. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1279068988-21864-5-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-12powerpc/perf_event: Fix oops due to perf_event_do_pending callPaul Mackerras
Anton Blanchard found that large POWER systems would occasionally crash in the exception exit path when profiling with perf_events. The symptom was that an interrupt would occur late in the exit path when the MSR[RI] (recoverable interrupt) bit was clear. Interrupts should be hard-disabled at this point but they were enabled. Because the interrupt was not recoverable the system panicked. The reason is that the exception exit path was calling perf_event_do_pending after hard-disabling interrupts, and perf_event_do_pending will re-enable interrupts. The simplest and cleanest fix for this is to use the same mechanism that 32-bit powerpc does, namely to cause a self-IPI by setting the decrementer to 1. This means we can remove the tests in the exception exit path and raw_local_irq_restore. This also makes sure that the call to perf_event_do_pending from timer_interrupt() happens within irq_enter/irq_exit. (Note that calling perf_event_do_pending from timer_interrupt does not mean that there is a possible 1/HZ latency; setting the decrementer to 1 ensures that the timer interrupt will happen immediately, i.e. within one timebase tick, which is a few nanoseconds or 10s of nanoseconds.) Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17powerpc: Add timer, performance monitor and machine check counts to ↵Anton Blanchard
/proc/interrupts With NO_HZ it is useful to know how often the decrementer is going off. The patch below adds an entry for it and also adds it into the /proc/stat summaries. While here, I added performance monitoring and machine check exceptions. I found it useful to keep an eye on the PMU exception rate when using the perf tool. Since it's possible to take a completely handled machine check on a System p box it also sounds like a good idea to keep a machine check summary. The event naming matches x86 to keep gratuitous differences to a minimum. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-09powerpc: Only print clockevent settings onceAnton Blanchard
The clockevent multiplier and shift is useful information, but we only need to print it once. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03powerpc: Replace per_cpu(, smp_processor_id()) with __get_cpu_var()Anton Blanchard
The cputime code has a few places that do per_cpu(, smp_processor_id()). Replace them with __get_cpu_var(). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-01-15powerpc: Fix decrementer setup on 1GHz boardsStefan Roese
We noticed that recent kernels didn't boot on our 1GHz Canyonlands 460EX boards anymore. As it seems, patch 8d165db1 [powerpc: Improve decrementer accuracy] introduced this problem. The routine div_sc() overflows with shift = 32 resulting in this incorrect setup: time_init: decrementer frequency = 1000.000012 MHz time_init: processor frequency = 1000.000012 MHz clocksource: timebase mult[400000] shift[22] registered clockevent: decrementer mult[33] shift[32] cpu[0] This patch now introduces a local div_dc64() version of this function so that this overflow doesn't happen anymore. Signed-off-by: Stefan Roese <sr@denx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Detlev Zundel <dzu@denx.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-12Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits) powerpc: Fix usage of 64-bit instruction in 32-bit altivec code MAINTAINERS: Add PowerPC patterns powerpc/pseries: Track previous CPPR values to correctly EOI interrupts powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP powerpc: Make "intspec" pointers in irq_host->xlate() const powerpc/8xx: DTLB Miss cleanup powerpc/8xx: Remove DIRTY pte handling in DTLB Error. powerpc/8xx: Start using dcbX instructions in various copy routines powerpc/8xx: Restore _PAGE_WRITETHRU powerpc/8xx: Add missing Guarded setting in DTLB Error. powerpc/8xx: Fixup DAR from buggy dcbX instructions. powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions. powerpc/8xx: Update TLB asm so it behaves as linux mm expects. powerpc/8xx: Invalidate non present TLBs powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate pseries/pseries: Add code to online/offline CPUs of a DLPAR node powerpc: stop_this_cpu: remove the cpu from the online map. powerpc/pseries: Add kernel based CPU DLPAR handling sysfs/cpu: Add probe/release files powerpc/pseries: Kernel DLPAR Infrastructure ...
2009-12-09Merge commit 'origin/master' into nextBenjamin Herrenschmidt
Conflicts: include/linux/kvm.h
2009-12-08Merge branch 'timers-for-linus-urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: Fix /proc/timer_list regression itimers: Fix racy writes to cpu_itimer fields timekeeping: Fix clock_gettime vsyscall time warp
2009-12-08Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers, init: Limit the number of per cpu calibration bootup messages posix-cpu-timers: optimize and document timer_create callback clockevents: Add missing include to pacify sparse x86: vmiclock: Fix printk format x86: Fix printk format due to variable type change sparc: fix printk for change of variable type clocksource/events: Fix fallout of generic code changes nohz: Allow 32-bit machines to sleep for more than 2.15 seconds nohz: Track last do_timer() cpu nohz: Prevent clocksource wrapping during idle nohz: Type cast printk argument mips: Use generic mult/shift factor calculation for clocks clocksource: Provide a generic mult/shift factor calculation clockevents: Use u32 for mult and shift factors nohz: Introduce arch_needs_cpu nohz: Reuse ktime in sub-functions of tick_check_idle. time: Remove xtime_cache time: Implement logarithmic time accumulation
2009-11-17timekeeping: Fix clock_gettime vsyscall time warpLin Ming
Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier to struct timekeeper" the clock multiplier of vsyscall is updated with the unmodified clock multiplier of the clock source and not with the NTP adjusted multiplier of the timekeeper. This causes user space observerable time warps: new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf Add a new argument "mult" to update_vsyscall() and hand in the timekeeping internal NTP adjusted multiplier. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-15Merge branches 'perf/powerpc' and 'perf/bench' into perf/coreIngo Molnar
Merge reason: Both 'perf bench' and the pending PowerPC changes are now ready for the next merge window. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-14clocksource/events: Fix fallout of generic code changesThomas Gleixner
powerpc grew a new warning due to the type change of clockevent->mult. The architectures which use parts of the generic time keeping infrastructure tripped over my wrong assumption that clocksource_register is only used when GENERIC_TIME=y. I should have looked and also I should have known better. These renitent Gaul villages are racking my nerves. Some serious deprecating is due. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-12Merge commit 'origin/master' into nextBenjamin Herrenschmidt
2009-11-05powerpc: Avoid giving out RTC dates below EPOCHBenjamin Herrenschmidt
Doing so causes xtime to be negative which crashes the timekeeping code in funny ways when doing suspend/resume Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05Export symbols for KVM moduleAlexander Graf
We want to be able to build KVM as a module. To enable us doing so, we need some more exports from core Linux parts. This patch exports all functions and variables that are required for KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-28powerpc: tracing: Add powerpc tracepoints for timer entry and exitAnton Blanchard
We can monitor the effectiveness of our power management of both the kernel and hypervisor by probing the timer interrupt. For example, on this box we see 10.37s timer interrupts on an idle core: <idle>-0 [010] 3900.671297: timer_interrupt_entry: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3900.671302: timer_interrupt_exit: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3911.042963: timer_interrupt_entry: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3911.042968: timer_interrupt_exit: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3921.414630: timer_interrupt_entry: pt_regs=c0000000ce1e7b10 <idle>-0 [010] 3921.414635: timer_interrupt_exit: pt_regs=c0000000ce1e7b10 Since we have a 207MHz decrementer it will go negative and fire every 10.37s even if Linux is completely idle. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-09-23Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: itimers: Add tracepoints for itimer hrtimer: Add tracepoint for hrtimers timers: Add tracepoints for timer_list timers cputime: Optimize jiffies_to_cputime(1) itimers: Simplify arm_timer() code a bit itimers: Fix periodic tics precision itimers: Merge ITIMER_VIRT and ITIMER_PROF Trivial header file include conflicts in kernel/fork.c
2009-09-21perf: Do the big rename: Performance Counters -> Performance EventsIngo Molnar
Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits) time: Prevent 32 bit overflow with set_normalized_timespec() clocksource: Delay clocksource down rating to late boot clocksource: clocksource_select must be called with mutex locked clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash timers: Drop a function prototype clocksource: Resolve cpu hotplug dead lock with TSC unstable timer.c: Fix S/390 comments timekeeping: Fix invalid getboottime() value timekeeping: Fix up read_persistent_clock() breakage on sh timekeeping: Increase granularity of read_persistent_clock(), build fix time: Introduce CLOCK_REALTIME_COARSE x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown clocksource: Avoid clocksource watchdog circular locking dependency clocksource: Protect the watchdog rating changes with clocksource_mutex clocksource: Call clocksource_change_rating() outside of watchdog_lock timekeeping: Introduce read_boot_clock timekeeping: Increase granularity of read_persistent_clock() timekeeping: Update clocksource with stop_machine timekeeping: Add timekeeper read_clock helper functions timekeeping: Move NTP adjusted clock multiplier to struct timekeeper ... Fix trivial conflict due to MIPS lemote -> loongson renaming.
2009-08-29Merge branch 'timers/posixtimers' into timers/tracingThomas Gleixner
Merge reason: timer tracepoint patches depend on both branches Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-28powerpc: Properly start decrementer on BookE secondary CPUsBenjamin Herrenschmidt
This moves the code to start the decrementer on 40x and BookE into a separate function which is now called from time_init() and secondary_time_init(), before the respective clock sources are registered. We also remove the 85xx specific code for doing it from the platform code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-23timekeeping: Increase granularity of read_persistent_clock(), build fixMartin Schwidefsky
Fix the following build problem on powerpc: arch/powerpc/kernel/time.c: In function 'read_persistent_clock': arch/powerpc/kernel/time.c:788: error: 'return' with a value, in function returning void arch/powerpc/kernel/time.c:791: error: 'return' with a value, in function returning void Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: dwalker@fifo99.com Cc: johnstul@us.ibm.com LKML-Reference: <20090822222313.74b9619c@skybase> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-20powerpc: Use DIV_ROUND_CLOSEST in time init codeJulia Lawall
The kernel.h macro DIV_ROUND_CLOSEST performs the computation (x + d/2)/d but is perhaps more readable. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @haskernel@ @@ #include <linux/kernel.h> @depends on haskernel@ expression x,__divisor; @@ - (((x) + ((__divisor) / 2)) / (__divisor)) + DIV_ROUND_CLOSEST(x,__divisor) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-15timekeeping: Increase granularity of read_persistent_clock()Martin Schwidefsky
The persistent clock of some architectures (e.g. s390) have a better granularity than seconds. To reduce the delta between the host clock and the guest clock in a virtualized system change the read_persistent_clock function to return a struct timespec. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134811.013873340@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-03cputime: Optimize jiffies_to_cputime(1)Stanislaw Gruszka
For powerpc with CONFIG_VIRT_CPU_ACCOUNTING jiffies_to_cputime(1) is not compile time constant and run time calculations are quite expensive. To optimize we use precomputed value. For all other architectures is is preprocessor definition. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <1248862529-6063-5-git-send-email-sgruszka@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18perf_counter: powerpc: Enable use of software counters on 32-bit powerpcPaul Mackerras
This enables the perf_counter subsystem on 32-bit powerpc. Since we don't have any support for hardware counters on 32-bit powerpc yet, only software counters can be used. Besides selecting HAVE_PERF_COUNTERS for 32-bit powerpc as well as 64-bit, the main thing this does is add an implementation of set_perf_counter_pending(). This needs to arrange for perf_counter_do_pending() to be called when interrupts are enabled. Rather than add code to local_irq_restore as 64-bit does, the 32-bit set_perf_counter_pending() generates an interrupt by setting the decrementer to 1 so that a decrementer interrupt will become pending in 1 or 2 timebase ticks (if a decrementer interrupt isn't already pending). When interrupts are enabled, timer_interrupt() will be called, and some new code in there calls perf_counter_do_pending(). We use a per-cpu array of flags to indicate whether we need to call perf_counter_do_pending() or not. This introduces a couple of new Kconfig symbols: PPC_HAVE_PMU_SUPPORT, which is selected by processor families for which we have hardware PMU support (currently only PPC64), and PPC_PERF_CTRS, which enables the powerpc-specific perf_counter back-end. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linuxppc-dev@ozlabs.org Cc: benh@kernel.crashing.org LKML-Reference: <19000.55404.103840.393470@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-15powerpc: Don't do generic calibrate_delay()Benjamin Herrenschmidt
Currently we are wasting time calling the generic calibrate_delay() function. We don't need it since our implementation of __delay() is based on the CPU timebase. So instead, we use our own small implementation that initializes loops_per_jiffy to something sensible to make the few users like spinlock debug be happy Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21powerpc: Improve decrementer accuracyAnton Blanchard
I have been looking at sources of OS jitter and notice that after a long NO_HZ idle period we wakeup too early: relative time (us) event timer irq exit 999946.405 timer irq entry 4.835 timer irq exit 21.685 timer irq entry 3.540 timer (tick_sched_timer) entry Here we slept for just under a second then took a timer interrupt that did nothing. 21.685 us later we wake up again and do the work. We set a rather low shift value of 16 for the decrementer clockevent, which I think is causing this issue. On this box we have a 207MHz decrementer and see: clockevent: decrementer mult[3501] shift[16] cpu[0] For calculations of large intervals this mult/shift combination could be off by a significant amount. I notice the sparc code has a loop that iterates to find a mult/shift combination that maximises the shift value while keeping mult under 32bit. With the patch below we get: clockevent: decrementer mult[35015c20] shift[32] cpu[15] And we no longer see the spurious wakeups. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-04-21clocksource: pass clocksource to read() callbackMagnus Damm
Pass clocksource pointer to the read() callback for clocksources. This allows us to share the callback between multiple instances. [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods] [akpm@linux-foundation.org: cleanup] Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02powerpc: Hook up rtc-generic, and kill rtc-ppcGeert Uytterhoeven
PowerPC has been a long time user of the generic RTC abstraction, so hook up rtc-generic: - Create the "rtc-generic" platform device if ppc_md.get_rtc_time is set, - Kill rtc-ppc, as rtc-generic offers the same functionality in a more generic way, and supports autoloading through udev. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2009-01-03Merge branch 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID [PATCH] improve idle cputime accounting [PATCH] improve precision of idle time detection. [PATCH] improve precision of process accounting. [PATCH] idle cputime accounting [PATCH] fix scaled & unscaled cputime accounting