summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2012-05-24nohz: Remove "Switched to NOHz mode" debugging messagesHeiko Carstens
When performing cpu hotplug tests the kernel printk log buffer gets flooded with pointless "Switched to NOHz mode..." messages. Especially when afterwards analyzing a dump this might have removed more interesting stuff out of the buffer. Assuming that switching to NOHz mode simply works just remove the printk. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Link: http://lkml.kernel.org/r/20110823112046.GB2540@osiris.boeblingen.de.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24PM / Sleep: Fix race between CPU hotplug and freezerSrivatsa S. Bhat
The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down() functions depend on the value of the 'tasks_frozen' argument passed to them (which indicates whether tasks have been frozen or not). (Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN, CPU_DEAD, CPU_DEAD_FROZEN). Thus, it is essential that while the callbacks for those notifications are running, the state of the system with respect to the tasks being frozen or not remains unchanged, *throughout that duration*. Hence there is a need for synchronizing the CPU hotplug code with the freezer subsystem. Since the freezer is involved only in the Suspend/Hibernate call paths, this patch hooks the CPU hotplug code to the suspend/hibernate notifiers PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent the race between CPU hotplug and freezer, thus ensuring that CPU hotplug notifications will always be run with the state of the system really being what the notifications indicate, _throughout_ their execution time. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-05-08sched: Cleanup cpu_active madnessPeter Zijlstra
Stepan found: CPU0 CPUn _cpu_up() __cpu_up() boostrap() notify_cpu_starting() set_cpu_online() while (!cpu_active()) cpu_relax() <PREEMPT-out> smp_call_function(.wait=1) /* we find cpu_online() is true */ arch_send_call_function_ipi_mask() /* wait-forever-more */ <PREEMPT-in> local_irq_enable() cpu_notify(CPU_ONLINE) sched_cpu_active() set_cpu_active() Now the purpose of cpu_active is mostly with bringing down a cpu, where we mark it !active to avoid the load-balancer from moving tasks to it while we tear down the cpu. This is required because we only update the sched_domain tree after we brought the cpu-down. And this is needed so that some tasks can still run while we bring it down, we just don't want new tasks to appear. On cpu-up however the sched_domain tree doesn't yet include the new cpu, so its invisible to the load-balancer, regardless of the active state. So instead of setting the active state after we boot the new cpu (and consequently having to wait for it before enabling interrupts) set the cpu active before we set it online and avoid the whole mess. Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-04-20Merge remote branch 'fsl-linux-sdk/imx_3.0.15_12.04.01' into imx_3.0.15_androidXinyu Chen
Conflicts: arch/arm/kernel/traps.c arch/arm/mach-mx6/board-mx6q_sabresd.c arch/arm/mach-mx6/cpu.c arch/arm/mach-mx6/system.c
2012-04-10futex: Do not leak robust list to unprivileged processKees Cook
It was possible to extract the robust list head address from a setuid process if it had used set_robust_list(), allowing an ASLR info leak. This changes the permission checks to be the same as those used for similar info that comes out of /proc. Running a setuid program that uses robust futexes would have had: cred->euid != pcred->euid cred->euid == pcred->uid so the old permissions check would allow it. I'm not aware of any setuid programs that use robust futexes, so this is just a preventative measure. (This patch is based on changes from grsecurity.) Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: kernel-hardening@lists.openwall.com Cc: spender@grsecurity.net Link: http://lkml.kernel.org/r/20120319231253.GA20893@www.outflux.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Huang Shijie <b32955@freescale.com>
2012-04-10futex: Simplify return logicThomas Gleixner
No need to assign ret in each case and break. Simply return the result of the handler function directly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Huang Shijie <b32955@freescale.com>
2012-04-10futex: Cover all PI opcodes with cmpxchg enabled checkThomas Gleixner
Some of the newer futex PI opcodes do not check the cmpxchg enabled variable and call unconditionally into the handling functions. Cover all PI opcodes in a separate check. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Huang Shijie <b32955@freescale.com>
2012-03-30Merge remote branch 'fsl-linux-sdk/imx_3.0.15' into imx_3.0.15_4.6.6Xinyu Chen
Conflicts: arch/arm/configs/imx6_defconfig arch/arm/configs/imx6_updater_defconfig arch/arm/mach-mx6/board-mx6q_sabreauto.c arch/arm/mach-mx6/board-mx6q_sabresd.c arch/arm/mach-mx6/clock.c arch/arm/mach-mx6/localtimer.c drivers/cpufreq/Makefile drivers/cpufreq/cpufreq_interactive.c drivers/input/keyboard/gpio_keys.c drivers/media/video/mxc/capture/Kconfig drivers/media/video/mxc/capture/mxc_v4l2_capture.c drivers/mmc/card/block.c drivers/mmc/host/sdhci-esdhc-imx.c drivers/mxc/gpu-viv/hal/os/linux/kernel/gc_hal_kernel_device.c drivers/usb/otg/fsl_otg.c drivers/video/mxc/mxc_ipuv3_fb.c include/linux/fsl_devices.h include/linux/mmc/host.h sound/soc/imx/Kconfig sound/soc/imx/Makefile sound/soc/imx/imx-hdmi-dma.c sound/soc/imx/imx-wm8958.c
2012-03-26ENGR00177745-1 Add interactive cpufreq governorAnson Huang
cpufreq: interactive: New 'interactive' governor This governor is designed for latency-sensitive workloads, such as interactive user interfaces. The interactive governor aims to be significantly more responsive to ramp CPU quickly up when CPU-intensive activity begins. Existing governors sample CPU load at a particular rate, typically every X ms. This can lead to under-powering UI threads for the period of time during which the user begins interacting with a previously-idle system until the next sample period happens. The 'interactive' governor uses a different approach. Instead of sampling the CPU at a specified rate, the governor will check whether to scale the CPU frequency up soon after coming out of idle. When the CPU comes out of idle, a timer is configured to fire within 1-2 ticks. If the CPU is very busy from exiting idle to when the timer fires then we assume the CPU is underpowered and ramp to MAX speed. If the CPU was not sufficiently busy to immediately ramp to MAX speed, then the governor evaluates the CPU load since the last speed adjustment, choosing the highest value between that longer-term load or the short-term load since idle exit to determine the CPU speed to ramp to. A realtime thread is used for scaling up, giving the remaining tasks the CPU performance benefit, unlike existing governors which are more likely to schedule rampup work to occur after your performance starved tasks have completed. The tuneables for this governor are: /sys/devices/system/cpu/cpufreq/interactive/min_sample_time: The minimum amount of time to spend at the current frequency before ramping down. This is to ensure that the governor has seen enough historic CPU load data to determine the appropriate workload. /sys/devices/system/cpu/cpufreq/interactive/go_maxspeed_load The CPU load at which to ramp to max speed. Signed-off-by: Anson Huang <b20788@freescale.com>
2012-03-16ENGR00175844 irq: enable IRQF_EARLY_RESUME irq when dpm_suspend_noirq failedXinyu Chen
The dpm_suspend_noirq routing calls suspend_device_irqs() first to disable all the irq no matter what flags it has. Then it enumerates the device driver on dpm_suspend_list to call their suspend_noirq pm callback. If any dev suspend failed, it will call dpm_resume_noirq to re-enable all the irq without IRQF_EARLY_RESUME, and return failed. If dpm_suspend_noirq() return failed, then no syscore_resume() can be called, that means the irq with flag IRQF_EARLY_RESUME, can not be re-enabled on this case. error = dpm_suspend_noirq(PMSG_SUSPEND); if (error) { .. goto Platform_finish; } .... error = syscore_suspend(); if (!error) { ... syscore_resume(); } ... dpm_resume_noirq(PMSG_RESUME); Platform_finish: if (suspend_ops->finish) suspend_ops->finish(); So we must enable all the irqs no matter it's IRQF_EARLY_RESUME or not. Otherwise the GPIO power key who's irq has flag of IRQF_EARLY_RESUME will be disabled forever when some device failed to suspend. Signed-off-by: Xinyu Chen <xinyu.chen@freescale.com>
2012-03-12ENGR00173756-2 cpufreq: fix license issueXinyu Chen
Correct the license header of cpufreq earlysuspend governor switch driver. Signed-off-by: Xinyu Chen <xinyu.chen@freescale.com>
2012-02-28futex: Fix uninterruptible loop due to gate_areaHugh Dickins
It was found (by Sasha) that if you use a futex located in the gate area we get stuck in an uninterruptible infinite loop, much like the ZERO_PAGE issue. While looking at this problem, PeterZ realized you'll get into similar trouble when hitting any install_special_pages() mapping. And are there still drivers setting up their own special mmaps without page->mapping, and without special VM or pte flags to make get_user_pages fail? In most cases, if page->mapping is NULL, we do not need to retry at all: Linus points out that even /proc/sys/vm/drop_caches poses no problem, because it ends up using remove_mapping(), which takes care not to interfere when the page reference count is raised. But there is still one case which does need a retry: if memory pressure called shmem_writepage in between get_user_pages_fast dropping page table lock and our acquiring page lock, then the page gets switched from filecache to swapcache (and ->mapping set to NULL) whatever the refcount. Fault it back in to get the page->mapping needed for key->shared.inode. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-28futex: Fix spelling in a source code commentBart Van Assche
Change a single occurrence of "unlcoked" into "unlocked". Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-02-28futex: uninitialized warning correctionsVitaliy Ivanov
The variables here are really not used uninitialized. kernel/futex.c: In function 'fixup_pi_state_owner.clone.17': kernel/futex.c:1582:6: warning: 'curval' may be used uninitialized in this function kernel/futex.c: In function 'handle_futex_death': kernel/futex.c:2486:6: warning: 'nval' may be used uninitialized in this function kernel/futex.c: In function 'do_futex': kernel/futex.c:863:11: warning: 'curval' may be used uninitialized in this function kernel/futex.c:828:6: note: 'curval' was declared here kernel/futex.c:898:5: warning: 'oldval' may be used uninitialized in this function kernel/futex.c:890:6: note: 'oldval' was declared here Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-02-28plist: Remove the need to supply locks to plist headsDima Zavin
This was legacy code brought over from the RT tree and is no longer necessary. Signed-off-by: Dima Zavin <dima@android.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Walker <dwalker@codeaurora.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1310084879-10351-2-git-send-email-dima@android.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-27ENGR00175308-1 MX6Q PM: cpu hotplug on earlysuspendLin Fuzhen
Add cpu hotplug support when system entry in earlysyspend. With this patch, when system entery in earlysuspend, it will record the online cpus, then hotplut none-bootable cpus, and in late resume, it will boot up the recorded hotplug cpus. Signed-off-by: Lin Fuzhen <fuzhen.lin@freescale.com>
2012-02-08ENGR00173756-1 Android: cpufreq: add cpu frquency governor save/restore ↵Zhang Jiejing
function. improve the power consumption situation for audio playback, background downloading,etc tasks. the scaling governor is set to conservative when display is turned off and the default governor is saved. The governor is restored when display is turned on. Signed-off-by: Wen Yi <wyi@nvidia.com> Signed-off-by: Zhang Jiejing <jiejing.zhang@freescale.com>
2012-02-02Merge branch 'android-3.0' into imx_3.0.15_androidXinyu Chen
Conflicts: drivers/misc/Kconfig drivers/misc/Makefile drivers/net/wireless/Makefile kernel/power/main.c sound/soc/soc-core.c
2012-01-09ENGR00156637 [MX6]Reboot take long time on SMPAnson Huang
Add work around to the reboot issue of SMP, with SMP, all the CPUs need to do _rcu_barrier, if we enqueue an rcu callback, we need to make sure CPU tick to stay alive until we take care of those by completing the appropriate grace period. This work around only work when the reboot command issue, so it didn't impact normal kernel feature. Signed-off-by: Anson Huang <b20788@freescale.com>
2012-01-09ENGR00133978 PM: add time sensitive debug function to suspend & resumeZhang Jiejing
There was some driver is slow on suspend/resume, but some embeded system like eReader,Cellphone are time sensitive,this commit will report the slow driver on suspend/resume, the default value is 500us(0.5ms) Also, the threshold can be change by modify '/sys/power/device_suspend_time_threshold' to change the threshold, it is in microsecond. The output is like: PM: device platform:soc-audio.2 suspend too slow, takes 606.696 msecs PM: device platform:mxc_sdc_fb.1 suspend too slow, takes 7.708 msecs the default state of suspend driver is default off, if you want to debug the suspend time, echo time in microsecond(u Second) to /sys/powe/device_suspend_time_threshold eg: I want to know which driver suspend & resume takes more that 0.5 ms (500 us), you can just : ehco 500 > /sys/power/device_suspend_time_threshold Signed-off-by: Zhang Jiejing <jiejing.zhang@freescale.com>
2012-01-03Revert "clockevents: Set noop handler in clockevents_exchange_device()"Linus Torvalds
commit 3b87487ac5008072f138953b07505a7e3493327f upstream. This reverts commit de28f25e8244c7353abed8de0c7792f5f883588c. It results in resume problems for various people. See for example http://thread.gmane.org/gmane.linux.kernel/1233033 http://thread.gmane.org/gmane.linux.kernel/1233389 http://thread.gmane.org/gmane.linux.kernel/1233159 http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877 and the fedora and ubuntu bug reports https://bugzilla.redhat.com/show_bug.cgi?id=767248 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569 which got bisected down to the stable version of this commit. Reported-by: Jonathan Nieder <jrnieder@gmail.com> Reported-by: Phil Miller <mille121@illinois.edu> Reported-by: Philip Langdale <philipl@overt.org> Reported-by: Tim Gardner <tim.gardner@canonical.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21Make TASKSTATS require root accessLinus Torvalds
commit 1a51410abe7d0ee4b1d112780f46df87d3621043 upstream. Ok, this isn't optimal, since it means that 'iotop' needs admin capabilities, and we may have to work on this some more. But at the same time it is very much not acceptable to let anybody just read anybody elses IO statistics quite at this level. Use of the GENL_ADMIN_PERM suggested by Johannes Berg as an alternative to checking the capabilities by hand. Reported-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Johannes Berg <johannes.berg@intel.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Moritz Mühlenhoff <jmm@inutil.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21alarmtimers: Fix time comparisonThomas Gleixner
commit c9c024b3f3e07d087974db4c0dc46217fff3a6c0 upstream. The expiry function compares the timer against current time and does not expire the timer when the expiry time is >= now. That's wrong. If the timer is set for now, then it must expire. Make the condition expiry > now for breaking out the loop. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09clockevents: Set noop handler in clockevents_exchange_device()Thomas Gleixner
commit de28f25e8244c7353abed8de0c7792f5f883588c upstream. If a device is shutdown, then there might be a pending interrupt, which will be processed after we reenable interrupts, which causes the original handler to be run. If the old handler is the (broadcast) periodic handler the shutdown state might hang the kernel completely. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09clocksource: Fix bug with max_deferment margin calculationYang Honggang (Joseph)
commit b1f919664d04a8d0ba29cb76673c7ca3325a2006 upstream. In order to leave a margin of 12.5% we should >> 3 not >> 5. Signed-off-by: Yang Honggang (Joseph) <eagle.rtlinux@gmail.com> [jstultz: Modified commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09jump_label: jump_label_inc may return before the code is patchedGleb Natapov
commit bbbf7af4bf8fc69bc751818cf30521080fa47dcb upstream. If cpu A calls jump_label_inc() just after atomic_add_return() is called by cpu B, atomic_inc_not_zero() will return value greater then zero and jump_label_inc() will return to a caller before jump_label_update() finishes its job on cpu B. Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09tick-broadcast: Stop active broadcast device when replacing itThomas Gleixner
commit c1be84309c58b1e7c6d626e28fba41a22b364c3d upstream. When a better rated broadcast device is installed, then the current active device is not disabled, which results in two running broadcast devices. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09tracing: fix event_subsystem ref countingIlya Dryomov
commit cb59974742aea24adf6637eb0c4b8e7b48bca6fb upstream. Fix a bug introduced by e9dbfae5, which prevents event_subsystem from ever being released. Ref_count was added to keep track of subsystem users, not for counting events. Subsystem is created with ref_count = 1, so there is no need to increment it for every event, we have nr_events for that. Fix this by touching ref_count only when we actually have a new user - subsystem_open(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09trace_events_filter: Use rcu_assign_pointer() when setting ↵Tejun Heo
ftrace_event_call->filter commit d3d9acf646679c1981032b0985b386d12fccc60c upstream. ftrace_event_call->filter is sched RCU protected but didn't use rcu_assign_pointer(). Use it. TODO: Add proper __rcu annotation to call->filter and all its users. -v2: Use RCU_INIT_POINTER() for %NULL clearing as suggested by Eric. Link: http://lkml.kernel.org/r/20111123164949.GA29639@google.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09genirq: Fix race condition when stopping the irq threadIdo Yariv
commit 550acb19269d65f32e9ac4ddb26c2b2070e37f1c upstream. In irq_wait_for_interrupt(), the should_stop member is verified before setting the task's state to TASK_INTERRUPTIBLE and calling schedule(). In case kthread_stop sets should_stop and wakes up the process after should_stop is checked by the irq thread but before the task's state is changed, the irq thread might never exit: kthread_stop irq_wait_for_interrupt ------------ ---------------------- ... ... while (!kthread_should_stop()) { kthread->should_stop = 1; wake_up_process(k); wait_for_completion(&kthread->exited); ... set_current_state(TASK_INTERRUPTIBLE); ... schedule(); } Fix this by checking if the thread should stop after modifying the task's state. [ tglx: Simplified it a bit ] Signed-off-by: Ido Yariv <ido@wizery.com> Link: http://lkml.kernel.org/r/1322740508-22640-1-git-send-email-ido@wizery.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09hrtimer: Fix extra wakeups from __remove_hrtimer()Jeff Ohlstein
commit 27c9cd7e601632b3794e1c3344d37b86917ffb43 upstream. __remove_hrtimer() attempts to reprogram the clockevent device when the timer being removed is the next to expire. However, __remove_hrtimer() reprograms the clockevent *before* removing the timer from the timerqueue and thus when hrtimer_force_reprogram() finds the next timer to expire it finds the timer we're trying to remove. This is especially noticeable when the system switches to NOHz mode and the system tick is removed. The timer tick is removed from the system but the clockevent is programmed to wakeup in another HZ anyway. Silence the extra wakeup by removing the timer from the timerqueue before calling hrtimer_force_reprogram() so that we actually program the clockevent for the next timer to expire. This was broken by 998adc3 "hrtimers: Convert hrtimers to use timerlist infrastructure". Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org> Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09timekeeping: add arch_offset hook to ktime_get functionsHector Palacios
commit d004e024058a0eaca097513ce62cbcf978913e0a upstream. ktime_get and ktime_get_ts were calling timekeeping_get_ns() but later they were not calling arch_gettimeoffset() so architectures using this mechanism returned 0 ns when calling these functions. This happened for example when running Busybox's ping which calls syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually calls ktime_get. As a result the returned ping travel time was zero. Signed-off-by: Hector Palacios <hector.palacios@digi.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09cgroup_freezer: fix freezing groups with stopped tasksMichal Hocko
commit 884a45d964dd395eda945842afff5e16bcaedf56 upstream. 2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state transitions) removed is_task_frozen_enough and replaced it with a simple frozen call. This, however, breaks freezing for a group with stopped tasks because those cannot be frozen and so the group remains in CGROUP_FREEZING state (update_if_frozen doesn't count stopped tasks) and never reaches CGROUP_FROZEN. Let's add is_task_frozen_enough back and use it at the original locations (update_if_frozen and try_to_freeze_cgroup). Semantically we consider stopped tasks as frozen enough so we should consider both cases when testing frozen tasks. Testcase: mkdir /dev/freezer mount -t cgroup -o freezer none /dev/freezer mkdir /dev/freezer/foo sleep 1h & pid=$! kill -STOP $pid echo $pid > /dev/freezer/foo/tasks echo FROZEN > /dev/freezer/foo/freezer.state while true do cat /dev/freezer/foo/freezer.state [ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break sleep 1 done echo OK Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Tomasz Buchert <tomasz.buchert@inria.fr> Cc: Paul Menage <paul@paulmenage.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09genirq: fix regression in irqfixup, irqpollEdward Donovan
commit 52553ddffad76ccf192d4dd9ce88d5818f57f62a upstream. Commit fa27271bc8d2("genirq: Fixup poll handling") introduced a regression that broke irqfixup/irqpoll for some hardware configurations. Amidst reorganizing 'try_one_irq', that patch removed a test that checked for 'action->handler' returning IRQ_HANDLED, before acting on the interrupt. Restoring this test back returns the functionality lost since 2.6.39. In the current set of tests, after 'action' is set, it must precede '!action->next' to take effect. With this and my previous patch to irq/spurious.c, c75d720fca8a, all IRQ regressions that I have encountered are fixed. Signed-off-by: Edward Donovan <edward.donovan@numble.net> Reported-and-tested-by: Rogério Brito <rbrito@ime.usp.br> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26genirq: Fix irqfixup, irqpoll regressionEdward Donovan
commit c75d720fca8a91ce99196d33adea383621027bf2 upstream. commit d05c65fff0 ("genirq: spurious: Run only one poller at a time") introduced a regression, leaving the boot options 'irqfixup' and 'irqpoll' non-functional. The patch placed tests in each function, to exit if the function is already running. The test in 'misrouted_irq' exited when it should have proceeded, effectively disabling 'misrouted_irq' and 'poll_spurious_irqs'. The check for an already running poller needs to be "!= 1" not "== 1" as "1" is the value when the first poller starts running. Signed-off-by: Edward Donovan <edward.donovan@numble.net> Cc: maciej.rutecki@gmail.com Link: http://lkml.kernel.org/r/1320175784-6745-1-git-send-email-edward.donovan@numble.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-22Fix "time: Catch invalid timespec sleep values in ↵Arve Hjønnevåg
__timekeeping_inject_sleeptime" to compile on 3.0 Change-Id: I1225f279cda04dedbfb7f853f6b58f1032bd6d2b
2011-11-22time: Catch invalid timespec sleep values in __timekeeping_inject_sleeptimeJohn Stultz
Arve suggested making sure we catch possible negative sleep time intervals that could be passed into timekeeping_inject_sleeptime. CC: Arve Hjønnevåg <arve@android.com> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-11PM / Suspend: Off by one in pm_suspend()Dan Carpenter
commit 528f7ce6e439edeac38f6b3f8561f1be129b5e91 upstream. In enter_state() we use "state" as an offset for the pm_states[] array. The pm_states[] array only has PM_SUSPEND_MAX elements so this test is off by one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11ptrace: don't clear GROUP_STOP_SIGMASK on double-stopOleg Nesterov
[This does not correspond to any specific patch in the upstream tree as it was fixed accidentally by rewriting the code in the 3.1 release] https://bugzilla.redhat.com/show_bug.cgi?id=740121 1. Luke Macken triggered WARN_ON(!(group_stop & GROUP_STOP_SIGMASK)) in do_signal_stop(). This is because do_signal_stop() clears GROUP_STOP_SIGMASK part unconditionally but doesn't update it if task_is_stopped(). 2. Looking at this problem I noticed that WARN_ON_ONCE(!ptrace) is not right, a stopped-but-resumed tracee can clone the untraced thread in the SIGNAL_STOP_STOPPED group, the new thread can start another group-stop. Remove this warning, we need more fixes to make it true. Reported-by: Luke Macken <lmacken@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlierIan Campbell
commit 9bab0b7fbaceec47d32db51cd9e59c82fb071f5a upstream. This adds a mechanism to resume selected IRQs during syscore_resume instead of dpm_resume_noirq. Under Xen we need to resume IRQs associated with IPIs early enough that the resched IPI is unmasked and we can therefore schedule ourselves out of the stop_machine where the suspend/resume takes place. This issue was introduced by 676dc3cf5bc3 "xen: Use IRQF_FORCE_RESUME". Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11tracing: Fix returning of duplicate data after EOF in trace_pipe_rawSteven Rostedt
commit 436fc280261dcfce5af38f08b89287750dc91cd2 upstream. The trace_pipe_raw handler holds a cached page from the time the file is opened to the time it is closed. The cached page is used to handle the case of the user space buffer being smaller than what was read from the ring buffer. The left over buffer is held in the cache so that the next read will continue where the data left off. After EOF is returned (no more data in the buffer), the index of the cached page is set to zero. If a user app reads the page again after EOF, the check in the buffer will see that the cached page is less than page size and will return the cached page again. This will cause reading the trace_pipe_raw again after EOF to return duplicate data, making the output look like the time went backwards but instead data is just repeated. The fix is to not reset the index right after all data is read from the cache, but to reset it after all data is read and more data exists in the ring buffer. Reported-by: Jeremy Eder <jeder@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11time: Change jiffies_to_clock_t() argument type to unsigned longhank
commit cbbc719fccdb8cbd87350a05c0d33167c9b79365 upstream. The parameter's origin type is long. On an i386 architecture, it can easily be larger than 0x80000000, causing this function to convert it to a sign-extended u64 type. Change the type to unsigned long so we get the correct result. Signed-off-by: hank <pyu@redhat.com> Cc: John Stultz <john.stultz@linaro.org> [ build fix ] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11kmod: prevent kmod_loop_msg overflow in __request_module()Jiri Kosina
commit 37252db6aa576c34fd794a5a54fb32d7a8b3a07a upstream. Due to post-increment in condition of kmod_loop_msg in __request_module(), the system log can be spammed by much more than 5 instances of the 'runaway loop' message if the number of events triggering it makes the kmod_loop_msg to overflow. Fix that by making sure we never increment it past the threshold. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-27Merge commit 'v3.0.8' into android-3.0Colin Cross
2011-10-25cputimer: Cure lock inversionPeter Zijlstra
commit bcd5cff7216f9b2de0a148cc355eac199dc6f1cf upstream. There's a lock inversion between the cputimer->lock and rq->lock; notably the two callchains involved are: update_rlimit_cpu() sighand->siglock set_process_cpu_timer() cpu_timer_sample_group() thread_group_cputimer() cputimer->lock thread_group_cputime() task_sched_runtime() ->pi_lock rq->lock scheduler_tick() rq->lock task_tick_fair() update_curr() account_group_exec() cputimer->lock Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and the second one is keeping up-to-date. This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure SMP accounting oddities"). Cure the problem by removing the cputimer->lock and rq->lock nesting, this leaves concurrent enablers doing duplicate work, but the time wasted should be on the same order otherwise wasted spinning on the lock and the greater-than assignment filter should ensure we preserve monotonicity. Reported-by: Dave Jones <davej@redhat.com> Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-25Avoid using variable-length arrays in kernel/sys.cLinus Torvalds
commit a84a79e4d369a73c0130b5858199e949432da4c6 upstream. The size is always valid, but variable-length arrays generate worse code for no good reason (unless the function happens to be inlined and the compiler sees the length for the simple constant it is). Also, there seems to be some code generation problem on POWER, where Henrik Bakken reports that register r28 can get corrupted under some subtle circumstances (interrupt happening at the wrong time?). That all indicates some seriously broken compiler issues, but since variable length arrays are bad regardless, there's little point in trying to chase it down. "Just don't do that, then". Reported-by: Henrik Grindal Bakken <henribak@cisco.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-16ftrace: Fix regression where ftrace breaks when modules are loadedSteven Rostedt
commit f7bc8b61f65726ff98f52e286b28e294499d7a08 upstream. Enabling function tracer to trace all functions, then load a module and then disable function tracing will cause ftrace to fail. This can also happen by enabling function tracing on the command line: ftrace=function and during boot up, modules are loaded, then you disable function tracing with 'echo nop > current_tracer' you will trigger a bug in ftrace that will shut itself down. The reason is, the new ftrace code keeps ref counts of all ftrace_ops that are registered for tracing. When one or more ftrace_ops are registered, all the records that represent the functions that the ftrace_ops will trace have a ref count incremented. If this ref count is not zero, when the code modification runs, that function will be enabled for tracing. If the ref count is zero, that function will be disabled from tracing. To make sure the accounting was working, FTRACE_WARN_ON()s were added to updating of the ref counts. If the ref count hits its max (> 2^30 ftrace_ops added), or if the ref count goes below zero, a FTRACE_WARN_ON() is triggered which disables all modification of code. Since it is common for ftrace_ops to trace all functions in the kernel, instead of creating > 20,000 hash items for the ftrace_ops, the hash count is just set to zero, and it represents that the ftrace_ops is to trace all functions. This is where the issues arrise. If you enable function tracing to trace all functions, and then add a module, the modules function records do not get the ref count updated. When the function tracer is disabled, all function records ref counts are subtracted. Since the modules never had their ref counts incremented, they go below zero and the FTRACE_WARN_ON() is triggered. The solution to this is rather simple. When modules are loaded, and their functions are added to the the ftrace pool, look to see if any ftrace_ops are registered that trace all functions. And for those, update the ref count for the module function records. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-10-16ftrace: Fix regression of :mod:module function enablingSteven Rostedt
commit 43dd61c9a09bd413e837df829e6bfb42159be52a upstream. The new code that allows different utilities to pick and choose what functions they trace broke the :mod: hook that allows users to trace only functions of a particular module. The reason is that the :mod: hook bypasses the hash that is setup to allow individual users to trace their own functions and uses the global hash directly. But if the global hash has not been set up, it will cause a bug: echo '*:mod:radeon' > /sys/kernel/debug/set_ftrace_filter produces: [drm:drm_mode_getfb] *ERROR* invalid framebuffer id [drm:radeon_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip BUG: unable to handle kernel paging request at ffffffff8160ec90 IP: [<ffffffff810d9136>] add_hash_entry+0x66/0xd0 PGD 1a05067 PUD 1a09063 PMD 80000000016001e1 Oops: 0003 [#1] SMP Jul 7 04:02:28 phyllis kernel: [55303.858604] CPU 1 Modules linked in: cryptd aes_x86_64 aes_generic binfmt_misc rfcomm bnep ip6table_filter hid radeon r8169 ahci libahci mii ttm drm_kms_helper drm video i2c_algo_bit intel_agp intel_gtt Pid: 10344, comm: bash Tainted: G WC 3.0.0-rc5 #1 Dell Inc. Inspiron N5010/0YXXJJ RIP: 0010:[<ffffffff810d9136>] [<ffffffff810d9136>] add_hash_entry+0x66/0xd0 RSP: 0018:ffff88003a96bda8 EFLAGS: 00010246 RAX: ffff8801301735c0 RBX: ffffffff8160ec80 RCX: 0000000000306ee0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880137c92940 RBP: ffff88003a96bdb8 R08: ffff880137c95680 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81c9df78 R13: ffff8801153d1000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f329c18a700(0000) GS:ffff880137c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8160ec90 CR3: 000000003002b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 10344, threadinfo ffff88003a96a000, task ffff88012fcfc470) Stack: 0000000000000fd0 00000000000000fc ffff88003a96be38 ffffffff810d92f5 ffff88011c4c4e00 ffff880000000000 000000000b69f4d0 ffffffff8160ec80 ffff8800300e6f06 0000000081130295 0000000000000282 ffff8800300e6f00 Call Trace: [<ffffffff810d92f5>] match_records+0x155/0x1b0 [<ffffffff810d940c>] ftrace_mod_callback+0xbc/0x100 [<ffffffff810dafdf>] ftrace_regex_write+0x16f/0x210 [<ffffffff810db09f>] ftrace_filter_write+0xf/0x20 [<ffffffff81166e48>] vfs_write+0xc8/0x190 [<ffffffff81167001>] sys_write+0x51/0x90 [<ffffffff815c7e02>] system_call_fastpath+0x16/0x1b Code: 48 8b 33 31 d2 48 85 f6 75 33 49 89 d4 4c 03 63 08 49 8b 14 24 48 85 d2 48 89 10 74 04 48 89 42 08 49 89 04 24 4c 89 60 08 31 d2 RIP [<ffffffff810d9136>] add_hash_entry+0x66/0xd0 RSP <ffff88003a96bda8> CR2: ffffffff8160ec90 ---[ end trace a5d031828efdd88e ]--- Reported-by: Brian Marete <marete@toshnix.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-16posix-cpu-timers: Cure SMP wobblesPeter Zijlstra
commit d670ec13178d0fd8680e6742a2bc6e04f28f87d8 upstream. David reported: Attached below is a watered-down version of rt/tst-cpuclock2.c from GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or similar. Run it several times, and you will see cases where the main thread will measure a process clock difference before and after the nanosleep which is smaller than the cpu-burner thread's individual thread clock difference. This doesn't make any sense since the cpu-burner thread is part of the top-level process's thread group. I've reproduced this on both x86-64 and sparc64 (using both 32-bit and 64-bit binaries). For example: [davem@boricha build-x86_64-linux]$ ./test process: before(0.001221967) after(0.498624371) diff(497402404) thread: before(0.000081692) after(0.498316431) diff(498234739) self: before(0.001223521) after(0.001240219) diff(16698) [davem@boricha build-x86_64-linux]$ The diff of 'process' should always be >= the diff of 'thread'. I make sure to wrap the 'thread' clock measurements the most tightly around the nanosleep() call, and that the 'process' clock measurements are the outer-most ones. --- #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <fcntl.h> #include <string.h> #include <errno.h> #include <pthread.h> static pthread_barrier_t barrier; static void *chew_cpu(void *arg) { pthread_barrier_wait(&barrier); while (1) __asm__ __volatile__("" : : : "memory"); return NULL; } int main(void) { clockid_t process_clock, my_thread_clock, th_clock; struct timespec process_before, process_after; struct timespec me_before, me_after; struct timespec th_before, th_after; struct timespec sleeptime; unsigned long diff; pthread_t th; int err; err = clock_getcpuclockid(0, &process_clock); if (err) return 1; err = pthread_getcpuclockid(pthread_self(), &my_thread_clock); if (err) return 1; pthread_barrier_init(&barrier, NULL, 2); err = pthread_create(&th, NULL, chew_cpu, NULL); if (err) return 1; err = pthread_getcpuclockid(th, &th_clock); if (err) return 1; pthread_barrier_wait(&barrier); err = clock_gettime(process_clock, &process_before); if (err) return 1; err = clock_gettime(my_thread_clock, &me_before); if (err) return 1; err = clock_gettime(th_clock, &th_before); if (err) return 1; sleeptime.tv_sec = 0; sleeptime.tv_nsec = 500000000; nanosleep(&sleeptime, NULL); err = clock_gettime(th_clock, &th_after); if (err) return 1; err = clock_gettime(my_thread_clock, &me_after); if (err) return 1; err = clock_gettime(process_clock, &process_after); if (err) return 1; diff = process_after.tv_nsec - process_before.tv_nsec; printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", process_before.tv_sec, process_before.tv_nsec, process_after.tv_sec, process_after.tv_nsec, diff); diff = th_after.tv_nsec - th_before.tv_nsec; printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", th_before.tv_sec, th_before.tv_nsec, th_after.tv_sec, th_after.tv_nsec, diff); diff = me_after.tv_nsec - me_before.tv_nsec; printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", me_before.tv_sec, me_before.tv_nsec, me_after.tv_sec, me_after.tv_nsec, diff); return 0; } This is due to us using p->se.sum_exec_runtime in thread_group_cputime() where we iterate the thread group and sum all data. This does not take time since the last schedule operation (tick or otherwise) into account. We can cure this by using task_sched_runtime() at the cost of having to take locks. This also means we can (and must) do away with thread_group_sched_runtime() since the modified thread_group_cputime() is now more accurate and would deadlock when called from thread_group_sched_runtime(). Aside of that it makes the function safe on 32 bit systems. The old code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a 64bit value and could be changed on another cpu at the same time. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins Tested-by: David Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-16sched: Fix up wchan borkageSimon Kirby
commit 6ebbe7a07b3bc40b168d2afc569a6543c020d2e3 upstream. Commit c259e01a1ec ("sched: Separate the scheduler entry for preemption") contained a boo-boo wrecking wchan output. It forgot to put the new schedule() function in the __sched section and thereby doesn't get properly ignored for things like wchan. Tested-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110923000346.GA25425@hostway.ca Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>