summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2010-05-23sched: Add a generic notifier when a task struct is about to be freedSan Mehat
This patch adds a notifier which can be used by subsystems that may be interested in when a task has completely died and is about to have it's last resource freed. The Android lowmemory killer uses this to determine when a task it has killed has finally given up its goods. Signed-off-by: San Mehat <san@google.com>
2010-05-17nohz: Allow 32-bit machines to sleep for more than 2.15 secondsJon Hunter
In the dynamic tick code, "max_delta_ns" (member of the "clock_event_device" structure) represents the maximum sleep time that can occur between timer events in nanoseconds. The variable, "max_delta_ns", is defined as an unsigned long which is a 32-bit integer for 32-bit machines and a 64-bit integer for 64-bit machines (if -m64 option is used for gcc). The value of max_delta_ns is set by calling the function "clockevent_delta2ns()" which returns a maximum value of LONG_MAX. For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in nanoseconds this equates to ~2.15 seconds. Hence, the maximum sleep time for a 32-bit machine is ~2.15 seconds, where as for a 64-bit machine it will be many years. This patch changes the type of max_delta_ns to be "u64" instead of "unsigned long" so that this variable is a 64-bit type for both 32-bit and 64-bit machines. It also changes the maximum value returned by clockevent_delta2ns() to KTIME_MAX. Hence this allows a 32-bit machine to sleep for longer than ~2.15 seconds. Please note that this patch also changes "min_delta_ns" to be "u64" too and although this is unnecessary, it makes the patch simpler as it avoids to fixup all callers of clockevent_delta2ns(). [ tglx: changed "unsigned long long" to u64 as we use this data type through out the time code ] Signed-off-by: Jon Hunter <jon-hunter@ti.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-17clockevents: Use u32 for mult and shift factorsThomas Gleixner
The mult and shift factors of clock events differ in their data type from those of clock sources for no reason. u32 is sufficient for both. shift is always <= 32 and mult is limited to 2^32-1 to avoid 64bit multiplication overflows in the conversion. Preparatory patch for a generic mult/shift factor calculation function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20091111134229.725664788@linutronix.de>
2010-04-08kernel: Mapped irq chip default_disable to chip->mask.Alex Frid
Despite the claim in struct irq_chip header: "disable: disable the interrupt (defaults to chip->mask if NULL)", it is not happening as default_disable is empty. Fixed it. Should also fix bug 667376. Change-Id: If0c39e3b4344701bbf235201c180d9c8ce56c489 Reviewed-on: http://git-master/r/947 Tested-by: Aleksandr Frid <afrid@nvidia.com> Reviewed-by: Gary King <gking@nvidia.com> Tested-by: Gary King <gking@nvidia.com>
2010-04-08sched: Fix set_cpu_active() in cpu_down()Xiaotian Feng
Sachin found cpu hotplug test failures on powerpc, which made the kernel hang on his POWER box. The problem is that we fail to re-activate a cpu when a hot-unplug fails. Fix this by moving the de-activation into _cpu_down after doing the initial checks. Remove the synchronize_sched() calls and rely on those implied by rebuilding the sched domains using the new mask. Reported-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Tested-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170517.500272612@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-10Merge commit 'v2.6.32.9' into android-2.6.32Arve Hjønnevåg
2010-02-23Export the symbol of getboottime and mmonotonic_to_bootbasedJason Wang
commit c93d89f3dbf0202bf19c07960ca8602b48c2f9a0 upstream. Export getboottime and monotonic_to_bootbased in order to let them could be used by following patch. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23futex: Handle futex value corruption gracefullyThomas Gleixner
commit 59647b6ac3050dd964bc556fe6ef22f4db5b935c upstream. The WARN_ON in lookup_pi_state which complains about a mismatch between pi_state->owner->pid and the pid which we retrieved from the user space futex is completely bogus. The code just emits the warning and then continues despite the fact that it detected an inconsistent state of the futex. A conveniant way for user space to spam the syslog. Replace the WARN_ON by a consistency check. If the values do not match return -EINVAL and let user space deal with the mess it created. This also fixes the missing task_pid_vnr() when we compare the pi_state->owner pid with the futex value. Reported-by: Jermome Marchand <jmarchan@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23futex: Handle user space corruption gracefullyThomas Gleixner
commit 51246bfd189064079c54421507236fd2723b18f3 upstream. If the owner of a PI futex dies we fix up the pi_state and set pi_state->owner to NULL. When a malicious or just sloppy programmed user space application sets the futex value to 0 e.g. by calling pthread_mutex_init(), then the futex can be acquired again. A new waiter manages to enqueue itself on the pi_state w/o damage, but on unlock the kernel dereferences pi_state->owner and oopses. Prevent this by checking pi_state->owner in the unlock path. If pi_state->owner is not current we know that user space manipulated the futex value. Ignore the mess and return -EINVAL. This catches the above case and also the case where a task hijacks the futex by setting the tid value and then tries to unlock it. Reported-by: Jermome Marchand <jmarchan@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23futex_lock_pi() key refcnt fixMikael Pettersson
commit 5ecb01cfdf96c5f465192bdb2a4fd4a61a24c6cc upstream. This fixes a futex key reference count bug in futex_lock_pi(), where a key's reference count is incremented twice but decremented only once, causing the backing object to not be released. If the futex is created in a temporary file in an ext3 file system, this bug causes the file's inode to become an "undead" orphan, which causes an oops from a BUG_ON() in ext3_put_super() when the file system is unmounted. glibc's test suite is known to trigger this, see <http://bugzilla.kernel.org/show_bug.cgi?id=14256>. The bug is a regression from 2.6.28-git3, namely Peter Zijlstra's 38d47c1b7075bd7ec3881141bb3629da58f88dab "[PATCH] futex: rely on get_user_pages() for shared futexes". That commit made get_futex_key() also increment the reference count of the futex key, and updated its callers to decrement the key's reference count before returning. Unfortunately the normal exit path in futex_lock_pi() wasn't corrected: the reference count is incremented by get_futex_key() and queue_lock(), but the normal exit path only decrements once, via unqueue_me_pi(). The fix is to put_futex_key() after unqueue_me_pi(), since 2.6.31 this is easily done by 'goto out_put_key' rather than 'goto out'. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-19power: wakelock: Print active wakelocks when has_wake_lock() is calledMike Chan
When DEBUG_SUSPEND is enabled print active wakelocks when we check if there are any active wakelocks. In print_active_locks(), print expired wakelocks if DEBUG_EXPIRE is enabled Change-Id: Ib1cb795555e71ff23143a2bac7c8a58cbce16547 Signed-off-by: Mike Chan <mike@android.com>
2010-02-09NET: fix oops at bootime in sysctl codejamal
This fixes the boot time oops on the 2.6.32-stable tree. It is needed only in this tree due to the divergance from upstream. From: jamal <hadi@cyberus.ca> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09kernel/cred.c: use kmem_cache_freeJulia Lawall
commit b8a1d37c5f981cdd2e83c9fd98198832324cd57a upstream. Free memory allocated using kmem_cache_zalloc using kmem_cache_free rather than kfree. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,E,c; @@ x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...) ... when != x = E when != &x ?-kfree(x) +kmem_cache_free(c,x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Steve Dickson <steved@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09clocksource: fix compilation if no GENERIC_TIMEAaro Koskinen
commit a362c638bdf052bf424bce7645d39b101090f6ba upstream Commit a9238ce3bb0fda6e760780b702c6cbd3793087d3 broke compilation on platforms that do not implement GENERIC_TIME (e.g. iop32x): kernel/time/clocksource.c: In function 'clocksource_register': kernel/time/clocksource.c:556: error: implicit declaration of function 'clocksource_max_deferment' Provide the implementation of clocksource_max_deferment() also for such platforms. Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-08kernel: printk: Add non exported function for clearing the log ring bufferSan Mehat
Signed-off-by: San Mehat <san@google.com>
2010-02-08printk: Fix log_buf_copy termination.Arve Hjønnevåg
If idx was non-zero and the log had wrapped, len did not get truncated to stop at the last byte written to the log.
2010-02-08Revert "printk: remove unused code from kernel/printk.c"Arve Hjønnevåg
This reverts commit acff181d3574244e651913df77332e897b88bff4.
2010-02-08cgroup: Add generic cgroup subsystem permission checks.San Mehat
Rather than using explicit euid == 0 checks when trying to move tasks into a cgroup via CFS, move permission checks into each specific cgroup subsystem. If a subsystem does not specify a 'can_attach' handler, then we fall back to doing our checks the old way. This way non-root processes can add arbitrary processes to a cgroup if all the registered subsystems on that cgroup agree. Also change explicit euid == 0 check to CAP_SYS_ADMIN Signed-off-by: San Mehat <san@google.com>
2010-02-03PM: earlysuspend: Removing dependence on console.Rebecca Schultz
Rather than signaling a full update of the display from userspace via a console switch, this patch introduces 2 files int /sys/power, wait_for_fb_sleep and wait_for_fb_wake. Reading these files will block until the requested state has been entered. When a read from wait_for_fb_sleep returns userspace should stop drawing. When wait_for_fb_wake returns, it should do a full update. If either are called when the fb driver is already in the requested state, they will return immediately. Signed-off-by: Rebecca Schultz <rschultz@google.com> Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03consoleearlysuspend: Fix for 2.6.32Arve Hjønnevåg
vt_waitactive now needs a 1 based console number Change-Id: I07ab9a3773c93d67c09d928c8d5494ce823ffa2e
2010-02-03PM: earlysuspend: Add console switch when user requested sleep state changes.Arve Hjønnevåg
Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03PM: wakelock: Don't dump unfrozen task list when aborting ↵Arve Hjønnevåg
try_to_freeze_tasks after less than one second Change-Id: Ib2976e5b97a5ee4ec9abd4d4443584d9257d0941 Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03PM: wakelock: Abort task freezing if a wake lock is held.Arve Hjønnevåg
Avoids a problem where the device sometimes hangs for 20 seconds before the screen is turned on.
2010-02-03PM: Add user-space wake lock api.Arve Hjønnevåg
This adds /sys/power/wake_lock and /sys/power/wake_unlock. Writing a string to wake_lock creates a wake lock the first time is sees a string and locks it. Optionally, the string can be followed by a timeout. To unlock the wake lock, write the same string to wake_unlock.
2010-02-03PM: Enable early suspend through /sys/power/stateArve Hjønnevåg
If EARLYSUSPEND is enabled then writes to /sys/power/state no longer blocks, and the kernel will try to enter the requested state every time no wakelocks are held. Write "on" to resume normal operation.
2010-02-03PM: Implement early suspend apiArve Hjønnevåg
2010-02-03PM: wakelocks: Use seq_file for /proc/wakelocks so we can get more than 3K ↵Arve Hjønnevåg
of stats. Change-Id: I42ed8bea639684f7a8a95b2057516764075c6b01 Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03power: wakelocks: fix buffer overflow in print_wake_locksErik Gilling
Change-Id: Ic944e3b3d3bc53eddc6fd0963565fd072cac373c Signed-off-by: Erik Gilling <konkers@android.com>
2010-02-03power: Prevent spinlock recursion when wake_unlock() is calledMike Chan
Signed-off-by: Mike Chan <mike@android.com>
2010-02-03PM: Implement wakelock api.Arve Hjønnevåg
PM: wakelock: Replace expire work with a timer The expire work function did not work in the normal case. Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03sched: make task dump print all 15 chars of proc commErik Gilling
Change-Id: I1a5c9676baa06c9f9b4424bbcab01b9b2fbfcd99 Signed-off-by: Erik Gilling <konkers@android.com>
2010-02-03sched: Enable might_sleep before initializing drivers.Arve Hjønnevåg
This allows detection of init bugs in built-in drivers. Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03Add build option to to set the default panic timeout.Arve Hjønnevåg
2010-02-03futex: Restore one of the fast paths eliminated by ↵Arve Hjønnevåg
38d47c1b7075bd7ec3881141bb3629da58f88dab This improves futex performance until our user-space code is fixed to use FUTEX_PRIVATE_FLAG for non-shared futexes. Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03mm: Add min_free_order_shift tunable.Arve Hjønnevåg
By default the kernel tries to keep half as much memory free at each order as it does for one order below. This can be too agressive when running without swap. Signed-off-by: Arve Hjønnevåg <arve@android.com>
2010-02-03ARM: Make low-level printk workTony Lindgren
Makes low-level printk work. Signed-off-by: Tony Lindgren <tony@atomide.com>
2010-02-03sched: Fix task priority bugPeter Zijlstra
83f9ac removed a call to effective_prio() in wake_up_new_task(), which leads to tasks running at MAX_PRIO. This is caused by the idle thread being set to MAX_PRIO before forking off init. O(1) used that to make sure idle was always preempted, CFS uses check_preempt_curr_idle() for that so we can savely remove this bit of legacy code. Reported-by: Mike Galbraith <efault@gmx.de> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1259754383.4003.610.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-28timers, init: Limit the number of per cpu calibration bootup messagesMike Travis
commit feae3203d711db0a9965300ee6d592257fdaae4f upstream. Limit the number of per cpu calibration messages by only printing out results for the first cpu to boot. Also, don't print "CPUx is down" as this is expected, and we don't need 4096 reminders... ;-) Signed-off-by: Mike Travis <travis@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091118002219.889552000@alcatraz.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28nohz: Prevent clocksource wrapping during idleJon Hunter
commit 98962465ed9e6ea99c38e0af63fe1dcb5a79dc25 upstream. The dynamic tick allows the kernel to sleep for periods longer than a single tick, but it does not limit the sleep time currently. In the worst case the kernel could sleep longer than the wrap around time of the time keeping clock source which would result in losing track of time. Prevent this by limiting it to the safe maximum sleep time of the current time keeping clock source. The value is calculated when the clock source is registered. [ tglx: simplified the code a bit and massaged the commit msg ] Signed-off-by: Jon Hunter <jon-hunter@ti.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28sched: Fix missing sched tunable recalculation on cpu add/removeChristian Ehrhardt
commit 0bcdcf28c979869f44e05121b96ff2cfb05bd8e6 upstream. Based on Peter Zijlstras patch suggestion this enables recalculation of the scheduler tunables in response of a change in the number of cpus. It also adds a max of eight cpus that are considered in that scaling. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1259579808-11357-2-git-send-email-ehrhardt@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28sched: Fix isolcpus boot optionRusty Russell
commit bdddd2963c0264c56f18043f6fa829d3c1d3d1c0 upstream. Anton Blanchard wrote: > We allocate and zero cpu_isolated_map after the isolcpus > __setup option has run. This means cpu_isolated_map always > ends up empty and if CPUMASK_OFFSTACK is enabled we write to a > cpumask that hasn't been allocated. I introduced this regression in 49557e620339cb13 (sched: Fix boot crash by zalloc()ing most of the cpu masks). Use the bootmem allocator if they set isolcpus=, otherwise allocate and zero like normal. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: peterz@infradead.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <200912021409.17013.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Anton Blanchard <anton@samba.org>
2010-01-28clockevents: Add missing include to pacify sparseH Hartley Sweeten
commit 8e1a928a2ed7e8d5cad97c8e985294b4caedd168 upstream. Include "tick-internal.h" in order to pick up the extern function prototype for clockevents_shutdown(). This quiets the following sparse build noise: warning: symbol 'clockevents_shutdown' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> LKML-Reference: <BD79186B4FD85F4B8E60E381CAEE190901E24550@mi8nycmail19.Mi8.com> Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> Cc: johnstul@us.ibm.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28clockevent: Don't remove broadcast device when cpu is deadXiaotian Feng
commit ea9d8e3f45404d411c00ae67b45cc35c58265bb7 upstream. Marc reported that the BUG_ON in clockevents_notify() triggers on his system. This happens because the kernel tries to remove an active clock event device (used for broadcasting) from the device list. The handling of devices which can be used as per cpu device and as a global broadcast device is suboptimal. The simplest solution for now (and for stable) is to check whether the device is used as global broadcast device, but this needs to be revisited. [ tglx: restored the cpuweight check and massaged the changelog ] Reported-by: Marc Dionne <marc.c.dionne@gmail.com> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> LKML-Reference: <1262834564-13033-1-git-send-email-dfeng@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25perf: Honour event state for aux stream dataPeter Zijlstra
commit 22e190851f8709c48baf00ed9ce6144cdc54d025 upstream. Anton reported that perf record kept receiving events even after calling ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP events didn't respect the disabled state and kept flowing in. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Anton Blanchard <anton@samba.org> LKML-Reference: <1263459187.4244.265.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25perf events: Dont report side-band events on each cpu for per-task-per-cpu ↵Peter Zijlstra
events commit 5d27c23df09b702868d9a3bff86ec6abd22963ac upstream. Acme noticed that his FORK/MMAP numbers were inflated by about the same factor as his cpu-count. This led to the discovery of a few more sites that need to respect the event->cpu filter. Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20091217121830.215333434@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22sched: Fix task priority bugPeter Zijlstra
commit 57785df5ac53c70da9fb53696130f3c551bfe1f9 upstream. 83f9ac removed a call to effective_prio() in wake_up_new_task(), which leads to tasks running at MAX_PRIO. This is caused by the idle thread being set to MAX_PRIO before forking off init. O(1) used that to make sure idle was always preempted, CFS uses check_preempt_curr_idle() for that so we can savely remove this bit of legacy code. Reported-by: Mike Galbraith <efault@gmx.de> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1259754383.4003.610.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCKDavid Miller
commit b9f8fcd55bbdb037e5332dbdb7b494f0b70861ac upstream. Relax stable-sched-clock architectures to not save/disable/restore hardirqs in cpu_clock(). The background is that I was trying to resolve a sparc64 perf issue when I discovered this problem. On sparc64 I implement pseudo NMIs by simply running the kernel at IRQ level 14 when local_irq_disable() is called, this allows performance counter events to still come in at IRQ level 15. This doesn't work if any code in an NMI handler does local_irq_save() or local_irq_disable() since the "disable" will kick us back to cpu IRQ level 14 thus letting NMIs back in and we recurse. The only path which that does that in the perf event IRQ handling path is the code supporting frequency based events. It uses cpu_clock(). cpu_clock() simply invokes sched_clock() with IRQs disabled. And that's a fundamental bug all on it's own, particularly for the HAVE_UNSTABLE_SCHED_CLOCK case. NMIs can thus get into the sched_clock() code interrupting the local IRQ disable code sections of it. Furthermore, for the not-HAVE_UNSTABLE_SCHED_CLOCK case, the IRQ disabling done by cpu_clock() is just pure overhead and completely unnecessary. So the core problem is that sched_clock() is not NMI safe, but we are invoking it from NMI contexts in the perf events code (via cpu_clock()). A less important issue is the overhead of IRQ disabling when it isn't necessary in cpu_clock(). CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures are not affected by this patch. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091213.182502.215092085.davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22futexes: Remove rw parameter from get_futex_key()KOSAKI Motohiro
commit 7485d0d3758e8e6491a5c9468114e74dc050785d upstream. Currently, futexes have two problem: A) The current futex code doesn't handle private file mappings properly. get_futex_key() uses PageAnon() to distinguish file and anon, which can cause the following bad scenario: 1) thread-A call futex(private-mapping, FUTEX_WAIT), it sleeps on file mapping object. 2) thread-B writes a variable and it makes it cow. 3) thread-B calls futex(private-mapping, FUTEX_WAKE), it wakes up blocked thread on the anonymous page. (but it's nothing) B) Current futex code doesn't handle zero page properly. Read mode get_user_pages() can return zero page, but current futex code doesn't handle it at all. Then, zero page makes infinite loop internally. The solution is to use write mode get_user_page() always for page lookup. It prevents the lookup of both file page of private mappings and zero page. Performance concerns: Probaly very little, because glibc always initialize variables for futex before to call futex(). It means glibc users never see the overhead of this patch. Compatibility concerns: This patch has few compatibility issues. After this patch, FUTEX_WAIT require writable access to futex variables (read-only mappings makes EFAULT). But practically it's not a problem, glibc always initalizes variables for futexes explicitly - nobody uses read-only mappings. Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Darren Hart <dvhltc@us.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Ulrich Drepper <drepper@gmail.com> LKML-Reference: <20100105162633.45A2.A69D9226@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=yRusty Russell
commit d4703aefdbc8f9f347f6dcefcddd791294314eb7 upstream. powerpc applies relocations to the kcrctab. They're absolute symbols, but it's not completely unreasonable: other archs may too, but the relocation is often 0. http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-November/077972.html Inspired-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Tested-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18fix more leaks in audit_tree.c tag_chunk()Al Viro
commit b4c30aad39805902cf5b855aa8a8b22d728ad057 upstream. Several leaks in audit_tree didn't get caught by commit 318b6d3d7ddbcad3d6867e630711b8a705d873d7, including the leak on normal exit in case of multiple rules refering to the same chunk. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>