summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2014-02-10MCE: Fix vm86 handling for 32bit mce handlerAndi Kleen
commit a129a7c84582629741e5fa6f40026efcd7a65bd4 upstream. When running on 32bit the mce handler could misinterpret vm86 mode as ring 0. This can affect whether it does recovery or not; it was possible to panic when recovery was actually possible. Fix this by always forcing vm86 to look like ring 3. [ Backport to 3.0 notes: Things changed there slightly: - move mce_get_rip() up. It fills up m->cs and m->ip values which are evaluated in mce_severity(). Therefore move it up right before the mce_severity call. This seem to be another bug in 3.0? - Place the backport (fix m->cs in V86 case) to where m->cs gets filled which is mce_get_rip() in 3.0 ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [PG: commit 8ef8fa7479fff9313387b873413f5ae233a2bd04 in v3.0.44] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME ↵Andy Honig
(CVE-2013-1796) commit c300aa64ddf57d9c5d9c898a64b36877345dd4a9 upstream. If the guest sets the GPA of the time_page so that the request to update the time straddles a page then KVM will write onto an incorrect page. The write is done byusing kmap atomic to get a pointer to the page for the time structure and then performing a memcpy to that page starting at an offset that the guest controls. Well behaved guests always provide a 32-byte aligned address, however a malicious guest could use this to corrupt host kernel memory. Tested: Tested against kvmclock unit test. Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10xen/bootup: allow {read|write}_cr8 pvops call.Konrad Rzeszutek Wilk
commit 1a7bbda5b1ab0e02622761305a32dc38735b90b2 upstream. We actually do not do anything about it. Just return a default value of zero and if the kernel tries to write anything but 0 we BUG_ON. This fixes the case when an user tries to suspend the machine and it blows up in save_processor_state b/c 'read_cr8' is set to NULL and we get: kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100! invalid opcode: 0000 [#1] SMP Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270 .. snip.. Call Trace: [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150 [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10xen/bootup: allow read_tscp call for Xen PV guests.Konrad Rzeszutek Wilk
commit cd0608e71e9757f4dae35bcfb4e88f4d1a03a8ab upstream. The hypervisor will trap it. However without this patch, we would crash as the .read_tscp is set to NULL. This patch fixes it and sets it to the native_read_tscp call. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updatesSamu Kallio
commit 1160c2779b826c6f5c08e5cc542de58fd1f667d5 upstream. In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops when lazy MMU updates are enabled, because set_pgd effects are being deferred. One instance of this problem is during process mm cleanup with memory cgroups enabled. The chain of events is as follows: - zap_pte_range enables lazy MMU updates - zap_pte_range eventually calls mem_cgroup_charge_statistics, which accesses the vmalloc'd mem_cgroup per-cpu stat area - vmalloc_fault is triggered which tries to sync the corresponding PGD entry with set_pgd, but the update is deferred - vmalloc_fault oopses due to a mismatch in the PUD entries The OOPs usually looks as so: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/fault.c:396! invalid opcode: 0000 [#1] SMP .. snip .. CPU 1 Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1 RIP: e030:[<ffffffff816271bf>] [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208 .. snip .. Call Trace: [<ffffffff81627759>] do_page_fault+0x399/0x4b0 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110 [<ffffffff81624065>] page_fault+0x25/0x30 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870 [<ffffffff81154962>] unmap_vmas+0x52/0xa0 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81059ce3>] mmput+0x83/0xf0 [<ffffffff810624c4>] exit_mm+0x104/0x130 [<ffffffff8106264a>] do_exit+0x15a/0x8c0 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0 [<ffffffff81063177>] sys_exit_group+0x17/0x20 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the changes visible to the consistency checks. RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737 Tested-by: Josh Boyer <jwboyer@redhat.com> Reported-and-Tested-by: Krishna Raman <kraman@redhat.com> Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com> Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10x86/mm: Check if PUD is large when validating a kernel addressMel Gorman
commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream. A user reported the following oops when a backup process reads /proc/kcore: BUG: unable to handle kernel paging request at ffffbb00ff33b000 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110 [...] Call Trace: [<ffffffff811b8aaa>] read_kcore+0x17a/0x370 [<ffffffff811ad847>] proc_reg_read+0x77/0xc0 [<ffffffff81151687>] vfs_read+0xc7/0x130 [<ffffffff811517f3>] sys_read+0x53/0xa0 [<ffffffff81449692>] system_call_fastpath+0x16/0x1b Investigation determined that the bug triggered when reading system RAM at the 4G mark. On this system, that was the first address using 1G pages for the virt->phys direct mapping so the PUD is pointing to a physical address, not a PMD page. The problem is that the page table walker in kern_addr_valid() is not checking pud_large() and treats the physical address as if it was a PMD. If it happens to look like pmd_none then it'll silently fail, probably returning zeros instead of real data. If the data happens to look like a present PMD though, it will be walked resulting in the oops above. This patch adds the necessary pud_large() check. Unfortunately the problem was not readily reproducible and now they are running the backup program without accessing /proc/kcore so the patch has not been validated but I think it makes sense. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.coM> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10x86, tls: Off by one limit checkDan Carpenter
commit 8f0750f19789cf352d7e24a6cc50f2ab1b4f1372 upstream. These are used as offsets into an array of GDT_ENTRY_TLS_ENTRIES members so GDT_ENTRY_TLS_ENTRIES is one past the end of the array. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: http://lkml.kernel.org/r/20120324075250.GA28258@elgon.mountain Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10x86/msr: Add capabilities checkAlan Cox
commit c903f0456bc69176912dee6dd25c6a66ee1aed00 upstream. At the moment the MSR driver only relies upon file system checks. This means that anything as root with any capability set can write to MSRs. Historically that wasn't very interesting but on modern processors the MSRs are such that writing to them provides several ways to execute arbitary code in kernel space. Sample code and documentation on doing this is circulating and MSR attacks are used on Windows 64bit rootkits already. In the Linux case you still need to be able to open the device file so the impact is fairly limited and reduces the security of some capability and security model based systems down towards that of a generic "root owns the box" setup. Therefore they should require CAP_SYS_RAWIO to prevent an elevation of capabilities. The impact of this is fairly minimal on most setups because they don't have heavy use of capabilities. Those using SELinux, SMACK or AppArmor rules might want to consider if their rulesets on the MSR driver could be tighter. Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.Jan Beulich
commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream. This fixes CVE-2013-0228 / XSA-42 Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user in 32bit PV guest can use to crash the > guest with the panic like this: ------------- general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51712/block/xvda/dev Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1 EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0 EIP is at xen_iret+0x12/0x2b EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010 ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069 Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000) Stack: 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000 Call Trace: Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00 8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40 10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02 EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0 general protection fault: 0000 [#2] ---[ end trace ab0d29a492dcd330 ]--- Kernel panic - not syncing: Fatal exception Pid: 1250, comm: r Tainted: G D --------------- 2.6.32-356.el6.i686 #1 Call Trace: [<c08476df>] ? panic+0x6e/0x122 [<c084b63c>] ? oops_end+0xbc/0xd0 [<c084b260>] ? do_general_protection+0x0/0x210 [<c084a9b7>] ? error_code+0x73/ ------------- Petr says: " I've analysed the bug and I think that xen_iret() cannot cope with mangled DS, in this case zeroed out (null selector/descriptor) by either xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT entry was invalidated by the reproducer. " Jan took a look at the preliminary patch and came up a fix that solves this problem: "This code gets called after all registers other than those handled by IRET got already restored, hence a null selector in %ds or a non-null one that got loaded from a code or read-only data descriptor would cause a kernel mode fault (with the potential of crashing the kernel as a whole, if panic_on_oops is set)." The way to fix this is to realize that the we can only relay on the registers that IRET restores. The two that are guaranteed are the %cs and %ss as they are always fixed GDT selectors. Also they are inaccessible from user mode - so they cannot be altered. This is the approach taken in this patch. Another alternative option suggested by Jan would be to relay on the subtle realization that using the %ebp or %esp relative references uses the %ss segment. In which case we could switch from using %eax to %ebp and would not need the %ss over-rides. That would also require one extra instruction to compensate for the one place where the register is used as scaled index. However Andrew pointed out that is too subtle and if further work was to be done in this code-path it could escape folks attention and lead to accidents. Reviewed-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Petr Matousek <pmatouse@redhat.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-02-10x86, random: make ARCH_RANDOM prompt if EMBEDDED, not EXPERTRomain Francoise
Before v2.6.38 CONFIG_EXPERT was known as CONFIG_EMBEDDED but the Kconfig entry was not changed to match when upstream commit 628c6246d47b85f5357298601df2444d7f4dd3fd ("x86, random: Architectural inlines to get random integers with RDRAND") was backported. Signed-off-by: Romain Francoise <romain@orebokech.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16x86: Don't use the EFI reboot method by defaultMatthew Garrett
commit f70e957cda22d309c769805cbb932407a5232219 upstream. Testing suggests that at least some Lenovos and some Intels will fail to reboot via EFI, attempting to jump to an unmapped physical address. In the long run we could handle this by providing a page table with a 1:1 mapping of physical addresses, but for now it's probably just easier to assume that ACPI or legacy methods will be present and reboot via those. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/r/1309985557-15350-1-git-send-email-mjg@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu> [PG: in 2.6.34, file is x86/platform/efi/efi.c --> x86/kernel/efi.c] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16x86: Get rid of asmregparmRichard Weinberger
commit 1b4ac2a935aaf194241a2f4165d6407ba9650e1a upstream. As UML does no longer need asmregparm we can remove it. Signed-off-by: Richard Weinberger <richard@nod.at> Cc: namhyung@gmail.com Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: dhowells@redhat.com Link: http://lkml.kernel.org/r/%3C1306189085-29896-1-git-send-email-richard%40nod.at%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16um: Use RWSEM_GENERIC_SPINLOCK on x86Richard Weinberger
commit 3a3679078aed2c451ebc32836bbd3b8219a65e01 upstream. Commit d12337 (rwsem: Remove redundant asmregparm annotation) broke rwsem on UML. As we cannot compile UML with -mregparm=3 and keeping asmregparm only for UML is inadequate the easiest solution is using RWSEM_GENERIC_SPINLOCK. Thanks to Thomas Gleixner for the idea. Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Richard Weinberger <richard@nod.at> Cc: user-mode-linux-devel@lists.sourceforge.net Link: http://lkml.kernel.org/r/%3C1306183893-26655-1-git-send-email-richard%40nod.at%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16rwsem: Remove redundant asmregparm annotationThomas Gleixner
commit d123375425d7df4b6081a631fc1203fceafa59b2 upstream. Peter Zijlstra pointed out, that the only user of asmregparm (x86) is compiling the kernel already with -mregparm=3. So the annotation of the rwsem functions is redundant. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David Miller <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> LKML-Reference: <alpine.LFD.2.00.1101262130450.31804@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [PG: fixes compile errors when using newer gcc on 2.6.34 baseline] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16KVM: x86: Prevent starting PIT timers in the absence of irqchip supportJan Kiszka
commit 0924ab2cfa98b1ece26c033d696651fd62896c69 upstream. User space may create the PIT and forgets about setting up the irqchips. In that case, firing PIT IRQs will crash the host: BUG: unable to handle kernel NULL pointer dereference at 0000000000000128 IP: [<ffffffffa10f6280>] kvm_set_irq+0x30/0x170 [kvm] ... Call Trace: [<ffffffffa11228c1>] pit_do_work+0x51/0xd0 [kvm] [<ffffffff81071431>] process_one_work+0x111/0x4d0 [<ffffffff81071bb2>] worker_thread+0x152/0x340 [<ffffffff81075c8e>] kthread+0x7e/0x90 [<ffffffff815a4474>] kernel_thread_helper+0x4/0x10 Prevent this by checking the irqchip mode before starting a timer. We can't deny creating the PIT if the irqchips aren't set up yet as current user land expects this order to work. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16oprofile, x86: Fix nmi-unsafe callgraph supportRobert Richter
commit a0e3e70243f5b270bc3eca718f0a9fa5e6b8262e upstream. Backport for stable kernel v2.6.32.y to v2.6.36.y. Current oprofile's x86 callgraph support may trigger page faults throwing the BUG_ON(in_nmi()) message below. This patch fixes this by using the same nmi-safe copy-from-user code as in perf. ------------[ cut here ]------------ kernel BUG at .../arch/x86/kernel/traps.c:436! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:07:00.0/0000:08:04.0/net/eth0/broadcast CPU 5 Modules linked in: Pid: 8611, comm: opcontrol Not tainted 2.6.39-00007-gfe47ae7 #1 Advanced Micro Device Anaheim/Anaheim RIP: 0010:[<ffffffff813e8e35>] [<ffffffff813e8e35>] do_nmi+0x22/0x1ee RSP: 0000:ffff88042fd47f28 EFLAGS: 00010002 RAX: ffff88042c0a7fd8 RBX: 0000000000000001 RCX: 00000000c0000101 RDX: 00000000ffff8804 RSI: ffffffffffffffff RDI: ffff88042fd47f58 RBP: ffff88042fd47f48 R08: 0000000000000004 R09: 0000000000001484 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88042fd47f58 R13: 0000000000000000 R14: ffff88042fd47d98 R15: 0000000000000020 FS: 00007fca25e56700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000074 CR3: 000000042d28b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process opcontrol (pid: 8611, threadinfo ffff88042c0a6000, task ffff88042c532310) Stack: 0000000000000000 0000000000000001 ffff88042c0a7fd8 0000000000000000 ffff88042fd47de8 ffffffff813e897a 0000000000000020 ffff88042fd47d98 0000000000000000 ffff88042c0a7fd8 ffff88042fd47de8 0000000000000074 Call Trace: <NMI> [<ffffffff813e897a>] nmi+0x1a/0x20 [<ffffffff813f08ab>] ? bad_to_user+0x25/0x771 <<EOE>> Code: ff 59 5b 41 5c 41 5d c9 c3 55 65 48 8b 04 25 88 b5 00 00 48 89 e5 41 55 41 54 49 89 fc 53 48 83 ec 08 f6 80 47 e0 ff ff 04 74 04 <0f> 0b eb fe 81 80 44 e0 ff ff 00 00 01 04 65 ff 04 25 c4 0f 01 RIP [<ffffffff813e8e35>] do_nmi+0x22/0x1ee RSP <ffff88042fd47f28> ---[ end trace ed6752185092104b ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 8611, comm: opcontrol Tainted: G D 2.6.39-00007-gfe47ae7 #1 Call Trace: <NMI> [<ffffffff813e5e0a>] panic+0x8c/0x188 [<ffffffff813e915c>] oops_end+0x81/0x8e [<ffffffff8100403d>] die+0x55/0x5e [<ffffffff813e8c45>] do_trap+0x11c/0x12b [<ffffffff810023c8>] do_invalid_op+0x91/0x9a [<ffffffff813e8e35>] ? do_nmi+0x22/0x1ee [<ffffffff8131e6fa>] ? oprofile_add_sample+0x83/0x95 [<ffffffff81321670>] ? op_amd_check_ctrs+0x4f/0x2cf [<ffffffff813ee4d5>] invalid_op+0x15/0x20 [<ffffffff813e8e35>] ? do_nmi+0x22/0x1ee [<ffffffff813e8e7a>] ? do_nmi+0x67/0x1ee [<ffffffff813e897a>] nmi+0x1a/0x20 [<ffffffff813f08ab>] ? bad_to_user+0x25/0x771 <<EOE>> Cc: John Lumby <johnlumby@hotmail.com> Cc: Maynard Johnson <maynardj@us.ibm.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16ARM: davinci: dm646x evm: wrong register used in setup_vpif_input_channel_modeHans Verkuil
commit 83713fc9373be2e943f82e9d36213708c6b0050e upstream. The function setup_vpif_input_channel_mode() used the VSCLKDIS register instead of VIDCLKCTL. This meant that when in HD mode videoport channel 0 used a different clock from channel 1. Clearly a copy-and-paste error. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Manjunath Hadli <manjunath.hadli@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16oprofile, x86: Fix crash when unloading module (nmi timer mode)Robert Richter
commit 97f7f8189fe54e3cfe324ef9ad35064f3d2d3bff upstream. If oprofile uses the nmi timer interrupt there is a crash while unloading the module. The bug can be triggered with oprofile build as module and kernel parameter nolapic set. This patch fixes this. oprofile: using NMI timer interrupt. BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 PGD 42dbca067 PUD 41da6a067 PMD 0 Oops: 0002 [#1] PREEMPT SMP CPU 5 Modules linked in: oprofile(-) [last unloaded: oprofile] Pid: 2518, comm: modprobe Not tainted 3.1.0-rc7-00019-gb2fb49d #19 Advanced Micro Device Anaheim/Anaheim RIP: 0010:[<ffffffff8123c226>] [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP: 0018:ffff88041ef71e98 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffffffffa0017100 RCX: dead000000200200 RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620 RBP: ffff88041ef71ea8 R08: 0000000000000001 R09: 0000000000000082 R10: 0000000000000000 R11: ffff88041ef71de8 R12: 0000000000000080 R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210 FS: 00007fc902f20700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 000000041cdb6000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 2518, threadinfo ffff88041ef70000, task ffff88041d348040) Stack: ffff88041ef71eb8 ffffffffa0017790 ffff88041ef71eb8 ffffffffa0013532 ffff88041ef71ec8 ffffffffa00132d6 ffff88041ef71ed8 ffffffffa00159b2 ffff88041ef71f78 ffffffff81073115 656c69666f72706f 0000000000610200 Call Trace: [<ffffffffa0013532>] op_nmi_exit+0x15/0x17 [oprofile] [<ffffffffa00132d6>] oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa00159b2>] oprofile_exit+0x1e/0x20 [oprofile] [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81 89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b RIP [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP <ffff88041ef71e98> CR2: 0000000000000008 ---[ end trace 43a541a52956b7b0 ]--- Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16x86/mpparse: Account for bus types other than ISA and PCIBjorn Helgaas
commit 9e6866686bdf2dcf3aeb0838076237ede532dcc8 upstream. In commit f8924e770e04 ("x86: unify mp_bus_info"), the 32-bit and 64-bit versions of MP_bus_info were rearranged to match each other better. Unfortunately it introduced a regression: prior to that change we used to always set the mp_bus_not_pci bit, then clear it if we found a PCI bus. After it, we set mp_bus_not_pci for ISA buses, clear it for PCI buses, and leave it alone otherwise. In the cases of ISA and PCI, there's not much difference. But ISA is not the only non-PCI bus, so it's better to always set mp_bus_not_pci and clear it only for PCI. Without this change, Dan's Dell PowerEdge 4200 panics on boot with a log indicating interrupt routing trouble unless the "noapic" option is supplied. With this change, the machine boots reliably without "noapic". Fixes http://bugs.debian.org/586494 Reported-bisected-and-tested-by: Dan McGrath <troubledaemon@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Dan McGrath <troubledaemon@gmail.com> Cc: Alexey Starikovskiy <aystarik@gmail.com> [jrnieder@gmail.com: clarified commit message] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Link: http://lkml.kernel.org/r/20111122215000.GA9151@elie.hsd1.il.comcast.net Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16sched, x86: Avoid unnecessary overflow in sched_clockSalman Qazi
commit 4cecf6d401a01d054afc1e5f605bcbfe553cb9b9 upstream. (Added the missing signed-off-by line) In hundreds of days, the __cycles_2_ns calculation in sched_clock has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing the final value to become zero. We can solve this without losing any precision. We can decompose TSC into quotient and remainder of division by the scale factor, and then use this to convert TSC into nanoseconds. Signed-off-by: Salman Qazi <sqazi@google.com> Acked-by: John Stultz <johnstul@us.ibm.com> Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111115221121.7262.88871.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16ARM: 7161/1: errata: no automatic store buffer drainWill Deacon
commit 11ed0ba1754841316d4095478944300acf19acc3 upstream. This patch implements a workaround for PL310 erratum 769419. On revisions of the PL310 prior to r3p2, the Store Buffer does not automatically drain. This can cause normal, non-cacheable writes to be retained when the memory system is idle, leading to suboptimal I/O performance for drivers using coherent DMA. This patch adds an optional wmb() call to the cpu_idle loop. On systems with an outer cache, this causes an explicit flush of the store buffer. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-01-16x86, ioapic: initialize nr_ioapic_registers early in mp_register_ioapic()Suresh Siddha
Lin Bao reported that one of the HP platforms failed to boot 2.6.32 kernel, when the BIOS enabled interrupt-remapping and x2apic before handing over the control to the Linux kernel. During boot, Linux kernel masks all the interrupt sources (8259, IO-APIC RTE's), setup the interrupt-remapping hardware with the OS controlled table and unmasks the 8259 interrupts but not the IO-APIC RTE's (as the newly setup interrupt-remapping table and the IO-APIC RTE's are not yet programmed by the kernel). Shortly after this, IO-APIC RTE's and the interrupt-remapping table entries are programmed based on the ACPI tables etc. So the expectation is that any interrupt during this window will be dropped and not see the intermediate configuration. In the reported problematic case, BIOS has configured the IO-APIC in virtual wire-B mode. Between the window of the kernel setting up new interrupt-remapping table and the IO-APIC RTE's are properly configured, an interrupt gets routed by the IO-APIC RTE (setup by the virtual wire-B configuration) and sees the empty interrupt-remapping table entry, resulting in vt-d fault causing the platform to generate NMI. And the OS panics on this unexpected NMI. This problem doesn't happen with more recent kernels and closer look at the 2.6.32 kernel shows that the code which masks the IO-APIC RTE's is not working as expected as the nr_ioapic_registers for each IO-APIC is not yet initialized at this point. In the later kernels we initialize nr_ioapic_registers much before and everything works as expected. For 2.6.[32..34] kernels, fix this issue by initializing nr_ioapic_registers early in mp_register_ioapic() [ Relevant upstream commit info: commit 7716a5c4ff5f1f3dc5e9edcab125cbf7fceef0af Author: Eric W. Biederman <ebiederm@xmission.com> Date: Tue Mar 30 01:07:12 2010 -0700 x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic. As the upstream commit depends on quite a few prior commits and some followup fixes in the mainline, we just picked the smallest relevant hunk for fixing the issue at hand. Problematic platform uses ACPI for IO-APIC, VT-d enumeration etc and this hunk only touches the ACPI based platforms. nr_ioapic_reigsters initialization in enable_IO_APIC() is still retained, so that other configurations like legacy MPS table based enumeration etc works with no change. ] Reported-and-tested-by: Zhang, Lin-Bao <linbao.zhang@hp.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17random: remove rand_initialize_irq()Theodore Ts'o
commit c5857ccf293968348e5eb4ebedc68074de3dcda6 upstream. With the new interrupt sampling system, we are no longer using the timer_rand_state structure in the irq descriptor, so we can stop initializing it now. [ Merged in fixes from Sedat to find some last missing references to rand_initialize_irq() ] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> [PG: in .34 the irqdesc.h content is in irq.h instead.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17x86, random: Architectural inlines to get random integers with RDRANDH. Peter Anvin
commit 628c6246d47b85f5357298601df2444d7f4dd3fd upstream. Architectural inlines to get random ints and longs using the RDRAND instruction. Intel has introduced a new RDRAND instruction, a Digital Random Number Generator (DRNG), which is functionally an high bandwidth entropy source, cryptographic whitener, and integrity monitor all built into hardware. This enables RDRAND to be used directly, bypassing the kernel random number pool. For technical documentation, see: http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/ In this patch, this is *only* used for the nonblocking random number pool. RDRAND is a nonblocking source, similar to our /dev/urandom, and is therefore not a direct replacement for /dev/random. The architectural hooks presented in the previous patch only feed the kernel internal users, which only use the nonblocking pool, and so this is not a problem. Since this instruction is available in userspace, there is no reason to have a /dev/hw_rng device driver for the purpose of feeding rngd. This is especially so since RDRAND is a nonblocking source, and needs additional whitening and reduction (see the above technical documentation for details) in order to be of "pure entropy source" quality. The CONFIG_EXPERT compile-time option can be used to disable this use of RDRAND. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Originally-by: Fenghua Yu <fenghua.yu@intel.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17x86, cpufeature: Update CPU feature RDRND to RDRANDKees Cook
commit 7ccafc5f75c87853f3c49845d5a884f2376e03ce upstream. The Intel manual changed the name of the CPUID bit to match the instruction name. We should follow suit for sanity's sake. (See Intel SDM Volume 2, Table 3-20 "Feature Information Returned in the ECX Register".) [ hpa: we can only do this at this time because there are currently no CPUs with this feature on the market, hence this is pre-hardware enabling. However, Cc:'ing stable so that stable can present a consistent ABI. ] Signed-off-by: Kees Cook <kees.cook@canonical.com> Link: http://lkml.kernel.org/r/20110524232926.GA27728@outflux.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17x86, cpu: Add CPU flags for F16C and RDRNDH. Peter Anvin
commit 24da9c26f3050aee9314ec09930a24c80fe76352 upstream. Add support for the newly documented F16C (16-bit floating point conversions) and RDRND (RDRAND instruction) CPU feature flags. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend.Ian Campbell
commit f611f2da99420abc973c32cdbddbf5c365d0a20c upstream. The patches missed an indirect use of IRQF_NO_SUSPEND pulled in via IRQF_TIMER. The following patch fixes the issue. With this fixlet PV guest migration works just fine. I also booted the entire series as a dom0 kernel and it appeared fine. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17um: fix ubd cow sizeRichard Weinberger
commit 8535639810e578960233ad39def3ac2157b0c3ec upstream. ubd_file_size() cannot use ubd_dev->cow.file because at this time ubd_dev->cow.file is not initialized. Therefore, ubd_file_size() will always report a wrong disk size when COW files are used. Reading from /dev/ubd* would crash the kernel. We have to read the correct disk size from the COW file's backing file. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17plat-mxc: iomux-v3.h: implicitly enable pull-up/down when that's desiredPaul Fertser
commit 6571534b600b8ca1936ff5630b9e0947f21faf16 upstream. To configure pads during the initialisation a set of special constants is used, e.g. #define MX25_PAD_FEC_MDIO__FEC_MDIO IOMUX_PAD(0x3c4, 0x1cc, 0x10, 0, 0, PAD_CTL_HYS | PAD_CTL_PUS_22K_UP) The problem is that no pull-up/down is getting activated unless both PAD_CTL_PUE (pull-up enable) and PAD_CTL_PKE (pull/keeper module enable) set. This is clearly stated in the i.MX25 datasheet and is confirmed by the measurements on hardware. This leads to some rather hard to understand bugs such as misdetecting an absent ethernet PHY (a real bug i had), unstable data transfer etc. This might affect mx25, mx35, mx50, mx51 and mx53 SoCs. It's reasonable to expect that if the pullup value is specified, the intention was to have it actually active, so we implicitly add the needed bits. Signed-off-by: Paul Fertser <fercerpav@gmail.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17iommu/amd: Fix wrong shift directionJoerg Roedel
commit fcd0861db1cf4e6ed99f60a815b7b72c2ed36ea4 upstream. The shift direction was wrong because the function takes a page number and i is the address is the loop. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> [PG: drivers/iommu/ was arch/x86/kernel/ in 2.6.34 context] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17KVM: s390: check cpu_id prior to using itCarsten Otte
commit 4d47555a80495657161a7e71ec3014ff2021e450 upstream. We use the cpu id provided by userspace as array index here. Thus we clearly need to check it first. Ooops. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17x86: Fix compilation bug in kprobes' twobyte_is_boostableJosh Stone
commit 315eb8a2a1b7f335d40ceeeb11b9e067475eb881 upstream. When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand for test_bit in kprobes' can_boost. I discovered that this caused only the first long of twobyte_is_boostable[] to be output. Jakub filed and fixed gcc PR50571 to correct the warning and this output issue. But to solve it for less current gcc, we can make kprobes' twobyte_is_boostable[] non-const, and it won't be optimized out. Before: CC arch/x86/kernel/kprobes.o In file included from include/linux/bitops.h:22:0, from include/linux/kernel.h:17, from [...]/arch/x86/include/asm/percpu.h:44, from [...]/arch/x86/include/asm/current.h:5, from [...]/arch/x86/include/asm/processor.h:15, from [...]/arch/x86/include/asm/atomic.h:6, from include/linux/atomic.h:4, from include/linux/mutex.h:18, from include/linux/notifier.h:13, from include/linux/kprobes.h:34, from arch/x86/kernel/kprobes.c:43: [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’: [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default] $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt 551: 0f a3 05 00 00 00 00 bt %eax,0x0 554: R_386_32 .rodata.cst4 $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... Contents of section .rodata.cst4: 0000 4c030000 L... Only a single long of twobyte_is_boostable[] is in the object file. After, without the const on twobyte_is_boostable: $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt 551: 0f a3 05 20 00 00 00 bt %eax,0x20 554: R_386_32 .data $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... 0010 00000000 00000000 00000000 00000000 ................ 0020 4c030000 0f000200 ffff0000 ffcff0c0 L............... 0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77 ....;.......&..w Now all 32 bytes are output into .data instead. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17ARM: davinci: da850 EVM: read mac address from SPI flashRajashekhara, Sudhakar
commit 810198bc9c109489dfadc57131c5183ce6ad2d7d upstream. DA850/OMAP-L138 EMAC driver uses random mac address instead of a fixed one because the mac address is not stuffed into EMAC platform data. This patch provides a function which reads the mac address stored in SPI flash (registered as MTD device) and populates the EMAC platform data. The function which reads the mac address is registered as a callback which gets called upon addition of MTD device. NOTE: In case the MAC address stored in SPI flash is erased, follow the instructions at [1] to restore it. [1] http://processors.wiki.ti.com/index.php/GSG:_OMAP-L138_DVEVM_Additional_Procedures#Restoring_MAC_address_on_SPI_Flash Modifications in v2: Guarded registering the mtd_notifier only when MTD is enabled. Earlier this was handled using mtd_has_partitions() call, but this has been removed in Linux v3.0. Modifications in v3: a. Guarded da850_evm_m25p80_notify_add() function and da850evm_spi_notifier structure with CONFIG_MTD macros. b. Renamed da850_evm_register_mtd_user() function to da850_evm_setup_mac_addr() and removed the struct mtd_notifier argument to this function. c. Passed the da850evm_spi_notifier structure to register_mtd_user() function. Modifications in v4: Moved the da850_evm_setup_mac_addr() function within the first CONFIG_MTD ifdef construct. Signed-off-by: Rajashekhara, Sudhakar <sudhakar.raj@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead.Konrad Rzeszutek Wilk
commit ed467e69f16e6b480e2face7bc5963834d025f91 upstream. We have hit a couple of customer bugs where they would like to use those parameters to run an UP kernel - but both of those options turn of important sources of interrupt information so we end up not being able to boot. The correct way is to pass in 'dom0_max_vcpus=1' on the Xen hypervisor line and the kernel will patch itself to be a UP kernel. Fixes bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637308 Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17xen: x86_32: do not enable iterrupts when returning from exception in ↵Igor Mammedov
interrupt context commit d198d499148a0c64a41b3aba9e7dd43772832b91 upstream. If vmalloc page_fault happens inside of interrupt handler with interrupts disabled then on exit path from exception handler when there is no pending interrupts, the following code (arch/x86/xen/xen-asm_32.S:112): cmpw $0x0001, XEN_vcpu_info_pending(%eax) sete XEN_vcpu_info_mask(%eax) will enable interrupts even if they has been previously disabled according to eflags from the bounce frame (arch/x86/xen/xen-asm_32.S:99) testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp) setz XEN_vcpu_info_mask(%eax) Solution is in setting XEN_vcpu_info_mask only when it should be set according to cmpw $0x0001, XEN_vcpu_info_pending(%eax) but not clearing it if there isn't any pending events. Reproducer for bug is attached to RHBZ 707552 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17powerpc/pci: Check devices status property when scanning OF treeSonny Rao
commit 5b339bdf164d8aee394609768f7e2e4415b0252a upstream. We ran into an issue where it looks like we're not properly ignoring a pci device with a non-good status property when we walk the device tree and instanciate the Linux side PCI devices. However, the EEH init code does look for the property and disables EEH on these devices. This leaves us in an inconsistent where we are poking at a supposedly bad piece of hardware and RTAS will block our config cycles because EEH isn't enabled anyway. Signed-of-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17sparc: fix array bounds error setting up PCIC NMI trapIan Campbell
commit 4a0342ca8e8150bd47e7118a76e300692a1b6b7b upstream. CC arch/sparc/kernel/pcic.o arch/sparc/kernel/pcic.c: In function 'pcic_probe': arch/sparc/kernel/pcic.c:359:33: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:359:8: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:360:33: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:360:8: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:361:33: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:361:8: error: array subscript is above array bounds [-Werror=array-bounds] cc1: all warnings being treated as errors I'm not particularly familiar with sparc but t_nmi (defined in head_32.S via the TRAP_ENTRY macro) and pcic_nmi_trap_patch (defined in entry.S) both appear to be 4 instructions long and I presume from the usage that instructions are int sized. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17sparc: Allow handling signals when stack is corrupted.David S. Miller
commit 5598473a5b40c47a8c5349dd2c2630797169cf1a upstream. If we can't push the pending register windows onto the user's stack, we disallow signal delivery even if the signal would be delivered on a valid seperate signal stack. Add a register window save area in the signal frame, and store any unsavable windows there. On sigreturn, if any windows are still queued up in the signal frame, try to push them back onto the stack and if that fails we kill the process immediately. This allows the debug/tst-longjmp_chk2 glibc test case to pass. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-08-17KVM: Ensure all vcpus are consistent with in-kernel irqchip settingsAvi Kivity
commit 3e515705a1f46beb1c942bb8043c16f8ac7b1e9e upstream. If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu->arch.apic is created without kvm->lock held. Based on earlier patch by Michael Ellerman. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Avi Kivity <avi@redhat.com> [PG: in .34 label "unlock_vcpu_destroy" is just "vcpu_destroy"] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17x86/PCI: do not tie MSI MS-7253 use_crs quirk to BIOS versionJonathan Nieder
commit a97f4f5e524bcd09a85ef0b8821a14d35e69335f upstream. Carlos was getting WARNING: at drivers/pci/pci.c:118 pci_ioremap_bar+0x24/0x52() when probing his sound card, and sound did not work. After adding pci=use_crs to the kernel command line, no more trouble. Ok, we can add a quirk. dmidecode output reveals that this is an MSI MS-7253, for which we already have a quirk, but the short-sighted author tied the quirk to a single BIOS version, making it not kick in on Carlos's machine with BIOS V1.2. If a later BIOS update makes it no longer necessary to look at the _CRS info it will still be harmless, so let's stop trying to guess which versions have and don't have accurate _CRS tables. Addresses https://bugtrack.alsa-project.org/alsa-bug/view.php?id=5533 Also see <https://bugzilla.kernel.org/show_bug.cgi?id=42619>. Reported-by: Carlos Luna <caralu74@gmail.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17x86/PCI: use host bridge _CRS info on MSI MS-7253Jonathan Nieder
commit 8411371709610c826bf65684f886bfdfb5780ca1 upstream. In the spirit of commit 29cf7a30f8a0 ("x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE"), this DMI quirk turns on "pci_use_crs" by default on a board that needs it. This fixes boot failures and oopses introduced in 3e3da00c01d0 ("x86/pci: AMD one chain system to use pci read out res"). The quirk is quite targetted (to a specific board and BIOS version) for two reasons: (1) to emphasize that this method of tackling the problem one quirk at a time is a little insane (2) to give BIOS vendors an opportunity to use simpler tables and allow us to return to generic behavior (whatever that happens to be) with a later BIOS update In other words, I am not at all happy with having quirks like this. But it is even worse for the kernel not to work out of the box on these machines, so... Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42619 Reported-by: Svante Signell <svante.signell@telia.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17score: fix off-by-one index into syscall tableDan Rosenberg
commit c25a785d6647984505fa165b5cd84cfc9a95970b upstream. If the provided system call number is equal to __NR_syscalls, the current check will pass and a function pointer just after the system call table may be called, since sys_call_table is an array with total size __NR_syscalls. Whether or not this is a security bug depends on what the compiler puts immediately after the system call table. It's likely that this won't do anything bad because there is an additional NULL check on the syscall entry, but if there happens to be a non-NULL value immediately after the system call table, this may result in local privilege escalation. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: <stable@vger.kernel.org> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17tty: icount changeover for other main devicesAlan Cox
commit 0587102cf9f427c185bfdeb2cef41e13ee0264b1 upstream. Again basically cut and paste Convert the main driver set to use the hooks for GICOUNT Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17x86, UV: Remove UV delay in starting slave cpusJack Steiner
commit 05e33fc20ea5e493a2a1e7f1d04f43cdf89f83ed upstream. Delete the 10 msec delay between the INIT and SIPI when starting slave cpus. I can find no requirement for this delay. BIOS also has similar code sequences without the delay. Removing the delay reduces boot time by 40 sec. Every bit helps. Signed-off-by: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/20110805140900.GA6774@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17x86-32, vdso: On system call restart after SYSENTER, use int $0x80H. Peter Anvin
commit 7ca0758cdb7c241cb4e0490a8d95f0eb5b861daf upstream. When we enter a 32-bit system call via SYSENTER or SYSCALL, we shuffle the arguments to match the int $0x80 calling convention. This was probably a design mistake, but it's what it is now. This causes errors if the system call as to be restarted. For SYSENTER, we have to invoke the instruction from the vdso as the return address is hardcoded. Accordingly, we can simply replace the jump in the vdso with an int $0x80 instruction and use the slower entry point for a post-restart. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFztZ=r5wa0x26KJQxvZOaQq8s2v3u50wCyJcA-Sc4g8gQ@mail.gmail.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17powerpc: pseries: Fix kexec on machines with more than 4TB of RAMAnton Blanchard
commit bed9a31527af8ff3dfbad62a1a42815cef4baab7 upstream. On a box with 8TB of RAM the MMU hashtable is 64GB in size. That means we have 4G PTEs. pSeries_lpar_hptab_clear was using a signed int to store the index which will overflow at 2G. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17powerpc: Fix device tree claim codeAnton Blanchard
commit 966728dd88b4026ec58fee169ccceaeaf56ef120 upstream. I have a box that fails in OF during boot with: DEFAULT CATCH!, exception-handler=fff00400 at %SRR0: 49424d2c4c6f6768 %SRR1: 800000004000b002 ie "IBM,Logh". OF got corrupted with a device tree string. Looking at make_room and alloc_up, we claim the first chunk (1 MB) but we never claim any more. mem_end is always set to alloc_top which is the top of our available address space, guaranteeing we will never call alloc_up and claim more memory. Also alloc_up wasn't setting alloc_bottom to the bottom of the available address space. This doesn't help the box to boot, but we at least fail with an obvious error. We could relocate the device tree in a future patch. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17x86: HPET: Chose a paranoid safe value for the ETIME checkThomas Gleixner
commit f1c18071ad70e2a78ab31fc26a18fcfa954a05c6 upstream. commit 995bd3bb5 (x86: Hpet: Avoid the comparator readback penalty) chose 8 HPET cycles as a safe value for the ETIME check, as we had the confirmation that the posted write to the comparator register is delayed by two HPET clock cycles on Intel chipsets which showed readback problems. After that patch hit mainline we got reports from machines with newer AMD chipsets which seem to have an even longer delay. See http://thread.gmane.org/gmane.linux.kernel/1054283 and http://thread.gmane.org/gmane.linux.kernel/1069458 for further information. Boris tried to come up with an ACPI based selection of the minimum HPET cycles, but this failed on a couple of test machines. And of course we did not get any useful information from the hardware folks. For now our only option is to chose a paranoid high and safe value for the minimum HPET cycles used by the ETIME check. Adjust the minimum ns value for the HPET clockevent accordingly. Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6> Cc: Simon Kirby <sim@hostway.ca> Cc: Borislav Petkov <bp@alien8.de> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17x86: Hpet: Avoid the comparator readback penaltyThomas Gleixner
commit 995bd3bb5c78f3ff71339803c0b8337ed36d64fb upstream. Due to the overly intelligent design of HPETs, we need to workaround the problem that the compare value which we write is already behind the actual counter value at the point where the value hits the real compare register. This happens for two reasons: 1) We read out the counter, add the delta and write the result to the compare register. When a NMI or SMI hits between the read out and the write then the counter can be ahead of the event already 2) The write to the compare register is delayed by up to two HPET cycles in certain chipsets. We worked around this by reading back the compare register to make sure that the written value has hit the hardware. For certain ICH9+ chipsets this can require two readouts, as the first one can return the previous compare register value. That's bad performance wise for the normal case where the event is far enough in the future. As we already know that the write can be delayed by up to two cycles we can avoid the read back of the compare register completely if we make the decision whether the delta has elapsed already or not based on the following calculation: cmp = event - actual_count; If cmp is less than 8 HPET clock cycles, then we decide that the event has happened already and return -ETIME. That covers the above #1 and seconds). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nix <nix@esperi.org.uk> Tested-by: Artur Skawina <art.08.09@gmail.com> Cc: Damien Wyart <damien.wyart@free.fr> Tested-by: John Drescher <drescherjm@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Tested-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <alpine.LFD.2.00.1009151500060.2416@localhost6.localdomain6> [PG: diffstat differs from 995bd3bb since deleted comment was re-wrapped] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-17alpha: fix several security issuesDan Rosenberg
commit 21c5977a836e399fc710ff2c5367845ed5c2527f upstream. Fix several security issues in Alpha-specific syscalls. Untested, but mostly trivial. 1. Signedness issue in osf_getdomainname allows copying out-of-bounds kernel memory to userland. 2. Signedness issue in osf_sysinfo allows copying large amounts of kernel memory to userland. 3. Typo (?) in osf_getsysinfo bounds minimum instead of maximum copy size, allowing copying large amounts of kernel memory to userland. 4. Usage of user pointer in osf_wait4 while under KERNEL_DS allows privilege escalation via writing return value of sys_wait4 to kernel memory. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>