summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/vm86_32.c
AgeCommit message (Collapse)Author
2018-03-22x86/vm86/32: Fix POPF emulationAndy Lutomirski
commit b5069782453459f6ec1fdeb495d9901a4545fcb5 upstream. POPF would trap if VIP was set regardless of whether IF was set. Fix it. Suggested-by: Stas Sergeev <stsp@list.ru> Reported-by: Bart Oldeman <bartoldeman@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 5ed92a8ab71f ("x86/vm86: Use the normal pt_regs area for vm86") Link: http://lkml.kernel.org/r/ce95f40556e7b2178b6bc06ee9557827ff94bd28.1521003603.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25x86/vm86: Fix unused variable warning if THP is disabledKirill A. Shutemov
commit 3ba5b5ea7dc3a10ef50819b43a9f8de2705f4eec upstream. GCC complains about unused variable 'vma' in mark_screen_rdonly() if THP is disabled: arch/x86/kernel/vm86_32.c: In function ‘mark_screen_rdonly’: arch/x86/kernel/vm86_32.c:180:26: warning: unused variable ‘vma’ [-Wunused-variable] struct vm_area_struct *vma = find_vma(mm, 0xA0000); That's silly. pmd_trans_huge() resolves to 0 when THP is disabled, so the whole block should be eliminated. Moving the variable declaration outside the if() block shuts GCC up. Reported-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: Carlos O'Donell <carlos@redhat.com> Link: http://lkml.kernel.org/r/20170213125228.63645-1-kirill.shutemov@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()Andy Lutomirski
commit 9ccee2373f0658f234727700e619df097ba57023 upstream. mark_screen_rdonly() is the last remaining caller of flush_tlb(). flush_tlb_mm_range() is potentially faster and isn't obsolete. Compile-tested only because I don't know whether software that uses this mechanism even exists. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08x86, bitops: remove use of "sbb" to return CFH. Peter Anvin
Use SETC instead of SBB to return the value of CF from assembly. Using SETcc enables uniformity with other flags-returning pieces of assembly code. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1465414726-197858-2-git-send-email-hpa@linux.intel.com Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-30x86/cpufeature: Replace the old static_cpu_has() with safe variantBorislav Petkov
So the old one didn't work properly before alternatives had run. And it was supposed to provide an optimized JMP because the assumption was that the offset it is jumping to is within a signed byte and thus a two-byte JMP. So I did an x86_64 allyesconfig build and dumped all possible sites where static_cpu_has() was used. The optimization amounted to all in all 12(!) places where static_cpu_has() had generated a 2-byte JMP. Which has saved us a whopping 36 bytes! This clearly is not worth the trouble so we can remove it. The only place where the optimization might count - in __switch_to() - we will handle differently. But that's not subject of this patch. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-6-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-15thp: rename split_huge_page_pmd() to split_huge_pmd()Kirill A. Shutemov
We are going to decouple splitting THP PMD from splitting underlying compound page. This patch renames split_huge_page_pmd*() functions to split_huge_pmd*() to reflect the fact that it doesn't imply page splitting, only PMD. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-19x86/cpufeature: Remove unused and seldomly used cpu_has_xx macrosBorislav Petkov
Those are stupid and code should use static_cpu_has_safe() or boot_cpu_has() instead. Kill the least used and unused ones. The remaining ones need more careful inspection before a conversion can happen. On the TODO. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de Cc: David Sterba <dsterba@suse.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Matt Mackall <mpm@selenic.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-05x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0Andy Lutomirski
vm86 exposes an interesting attack surface against the entry code. Since vm86 is mostly useless anyway if mmap_min_addr != 0, just turn it off in that case. There are some reports that vbetool can work despite setting mmap_min_addr to zero. This shouldn't break that use case, as CAP_SYS_RAWIO already overrides mmap_min_addr. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Kees Cook <keescook@chromium.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stas Sergeev <stsp@list.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31x86/vm86: Rename vm86->v86flags and v86maskBrian Gerst
Rename v86flags to veflags, and v86mask to veflags_mask. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-9-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31x86/vm86: Rename vm86->vm86_info to user_vm86Brian Gerst
Make it clearer that this is the pointer to the userspace vm86 state area. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-8-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31x86/vm86: Clean up vm86.h includesBrian Gerst
vm86.h was being implicitly included in alot of places via processor.h, which in turn got it from math_emu.h. Break that chain and explicitly include vm86.h in all files that need it. Also remove unused vm86 field from math_emu_info. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-7-git-send-email-brgerst@gmail.com [ Fixed build failure. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31x86/vm86: Use the normal pt_regs area for vm86Brian Gerst
Change to use the normal pt_regs area to enter and exit vm86 mode. This is done by increasing the padding at the top of the stack to make room for the extra vm86 segment slots in the IRET frame. It then saves the 32-bit regs in the off-stack vm86 data, and copies in the vm86 regs. Exiting back to 32-bit mode does the reverse. This allows removing the hacks to jump directly into the exit asm code due to having to change the stack pointer. Returning normally from the vm86 syscall and the exception handlers allows things like ptrace and auditing to work properly. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-5-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31x86/vm86: Eliminate 'struct kernel_vm86_struct'Brian Gerst
Now there is no vm86-specific data left on the kernel stack while in userspace, except for the 32-bit regs. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'Brian Gerst
Move the non-regs fields to the off-stack data. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31x86/vm86: Move vm86 fields out of 'thread_struct'Brian Gerst
Allocate a separate structure for the vm86 fields. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-2-git-send-email-brgerst@gmail.com [ Build fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21x86/entry/vm86: Move userspace accesses to do_sys_vm86()Brian Gerst
Move the userspace accesses down into the common function in preparation for the next set of patches. Also change to copying the fields explicitly instead of assuming a fixed order in pt_regs and the kernel data structures. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437354550-25858-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21x86/entry/vm86: Preserve 'orig_ax'Brian Gerst
There is no legitimate reason for usermode to modify the 'orig_ax' field on entry to vm86 mode, so copy it from the 32-bit regs. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437354550-25858-3-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21x86/entry/vm86: Clean up saved_fs/gsBrian Gerst
There is no need to save FS and non-lazy GS outside the 32-bit regs. Lazy GS still needs to be saved because it wasn't saved on syscall entry. Save it in the gs slot of regs32, which is present but unused. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1437354550-25858-2-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06x86/asm/entry: Rename 'init_tss' to 'cpu_tss'Andy Lutomirski
It has nothing to do with init -- there's only one TSS per cpu. Other names considered include: - current_tss: Confusing because we never switch the tss. - singleton_tss: Too long. This patch was generated with 's/init_tss/cpu_tss/g'. Followup patches will fix INIT_TSS and INIT_TSS_IST by hand. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-02x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)Alexander van Heukelum
Commit 49cb25e9290 x86: 'get rid of pt_regs argument in vm86/vm86old' got rid of the pt_regs stub for sys_vm86old and sys_vm86. The functions were, however, not changed to use the calling convention for syscalls. [AV: killed asmlinkage_protect() - it's done automatically now] Reported-and-tested-by: Hans de Bruin <jmdebruin@xmsnet.nl> Cc: stable@vger.kernel.org Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03x86: get rid of pt_regs argument in vm86/vm86oldAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-12thp: change split_huge_page_pmd() interfaceKirill A. Shutemov
Pass vma instead of mm and add address parameter. In most cases we already have vma on the stack. We provides split_huge_page_pmd_mm() for few cases when we have mm, but not vma. This change is preparation to huge zero pmd splitting implementation. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-20x86: get rid of TIF_IRET hackeryAl Viro
TIF_NOTIFY_RESUME will work in precisely the same way; all that is achieved by TIF_IRET is appearing that there's some work to be done, so we end up on the iret exit path. Just use NOTIFY_RESUME. And for execve() do that in 32bit start_thread(), not sys_execve() itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-06x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>Joe Perches
Use a more current logging style: - Bare printks should have a KERN_<LEVEL> for consistency's sake - Add pr_fmt where appropriate - Neaten some macro definitions - Convert some Ok output to OK - Use "%s: ", __func__ in pr_fmt for summit - Convert some printks to pr_<level> Message output is not identical in all cases. Signed-off-by: Joe Perches <joe@perches.com> Cc: levinsasha928@gmail.com Link: http://lkml.kernel.org/r/1337655007.24226.10.camel@joe2Laptop [ merged two similar patches, tidied up the changelog ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-29Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Peter Anvin: "The biggest textual change is the cleanup to use symbolic constants for x86 trap values. The only *functional* change and the reason for the x86/x32 dependency is the move of is_ia32_task() into <asm/thread_info.h> so that it can be used in other code that needs to understand if a system call comes from the compat entry point (and therefore uses i386 system call numbers) or not. One intended user for that is the BPF system call filter. Moving it out of <asm/compat.h> means we can define it unconditionally, returning always true on i386." * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Move is_ia32_task to asm/thread_info.h from asm/compat.h x86: Rename trap_no to trap_nr in thread_struct x86: Use enum instead of literals for trap values
2012-03-21mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read modeAndrea Arcangeli
In some cases it may happen that pmd_none_or_clear_bad() is called with the mmap_sem hold in read mode. In those cases the huge page faults can allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a false positive from pmd_bad() that will not like to see a pmd materializing as trans huge. It's not khugepaged causing the problem, khugepaged holds the mmap_sem in write mode (and all those sites must hold the mmap_sem in read mode to prevent pagetables to go away from under them, during code review it seems vm86 mode on 32bit kernels requires that too unless it's restricted to 1 thread per process or UP builds). The race is only with the huge pagefaults that can convert a pmd_none() into a pmd_trans_huge(). Effectively all these pmd_none_or_clear_bad() sites running with mmap_sem in read mode are somewhat speculative with the page faults, and the result is always undefined when they run simultaneously. This is probably why it wasn't common to run into this. For example if the madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page fault, the hugepage will not be zapped, if the page fault runs first it will be zapped. Altering pmd_bad() not to error out if it finds hugepmds won't be enough to fix this, because zap_pmd_range would then proceed to call zap_pte_range (which would be incorrect if the pmd become a pmd_trans_huge()). The simplest way to fix this is to read the pmd in the local stack (regardless of what we read, no need of actual CPU barriers, only compiler barrier needed), and be sure it is not changing under the code that computes its value. Even if the real pmd is changing under the value we hold on the stack, we don't care. If we actually end up in zap_pte_range it means the pmd was not none already and it was not huge, and it can't become huge from under us (khugepaged locking explained above). All we need is to enforce that there is no way anymore that in a code path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad can run into a hugepmd. The overhead of a barrier() is just a compiler tweak and should not be measurable (I only added it for THP builds). I don't exclude different compiler versions may have prevented the race too by caching the value of *pmd on the stack (that hasn't been verified, but it wouldn't be impossible considering pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines and there's no external function called in between pmd_trans_huge and pmd_none_or_clear_bad). if (pmd_trans_huge(*pmd)) { if (next-addr != HPAGE_PMD_SIZE) { VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); split_huge_page_pmd(vma->vm_mm, pmd); } else if (zap_huge_pmd(tlb, vma, pmd, addr)) continue; /* fall through */ } if (pmd_none_or_clear_bad(pmd)) Because this race condition could be exercised without special privileges this was reported in CVE-2012-1179. The race was identified and fully explained by Ulrich who debugged it. I'm quoting his accurate explanation below, for reference. ====== start quote ======= mapcount 0 page_mapcount 1 kernel BUG at mm/huge_memory.c:1384! At some point prior to the panic, a "bad pmd ..." message similar to the following is logged on the console: mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7). The "bad pmd ..." message is logged by pmd_clear_bad() before it clears the page's PMD table entry. 143 void pmd_clear_bad(pmd_t *pmd) 144 { -> 145 pmd_ERROR(*pmd); 146 pmd_clear(pmd); 147 } After the PMD table entry has been cleared, there is an inconsistency between the actual number of PMD table entries that are mapping the page and the page's map count (_mapcount field in struct page). When the page is subsequently reclaimed, __split_huge_page() detects this inconsistency. 1381 if (mapcount != page_mapcount(page)) 1382 printk(KERN_ERR "mapcount %d page_mapcount %d\n", 1383 mapcount, page_mapcount(page)); -> 1384 BUG_ON(mapcount != page_mapcount(page)); The root cause of the problem is a race of two threads in a multithreaded process. Thread B incurs a page fault on a virtual address that has never been accessed (PMD entry is zero) while Thread A is executing an madvise() system call on a virtual address within the same 2 MB (huge page) range. virtual address space .---------------------. | | | | .-|---------------------| | | | | | |<-- B(fault) | | | 2 MB | |/////////////////////|-. huge < |/////////////////////| > A(range) page | |/////////////////////|-' | | | | | | '-|---------------------| | | | | '---------------------' - Thread A is executing an madvise(..., MADV_DONTNEED) system call on the virtual address range "A(range)" shown in the picture. sys_madvise // Acquire the semaphore in shared mode. down_read(&current->mm->mmap_sem) ... madvise_vma switch (behavior) case MADV_DONTNEED: madvise_dontneed zap_page_range unmap_vmas unmap_page_range zap_pud_range zap_pmd_range // // Assume that this huge page has never been accessed. // I.e. content of the PMD entry is zero (not mapped). // if (pmd_trans_huge(*pmd)) { // We don't get here due to the above assumption. } // // Assume that Thread B incurred a page fault and .---------> // sneaks in here as shown below. | // | if (pmd_none_or_clear_bad(pmd)) | { | if (unlikely(pmd_bad(*pmd))) | pmd_clear_bad | { | pmd_ERROR | // Log "bad pmd ..." message here. | pmd_clear | // Clear the page's PMD entry. | // Thread B incremented the map count | // in page_add_new_anon_rmap(), but | // now the page is no longer mapped | // by a PMD entry (-> inconsistency). | } | } | v - Thread B is handling a page fault on virtual address "B(fault)" shown in the picture. ... do_page_fault __do_page_fault // Acquire the semaphore in shared mode. down_read_trylock(&mm->mmap_sem) ... handle_mm_fault if (pmd_none(*pmd) && transparent_hugepage_enabled(vma)) // We get here due to the above assumption (PMD entry is zero). do_huge_pmd_anonymous_page alloc_hugepage_vma // Allocate a new transparent huge page here. ... __do_huge_pmd_anonymous_page ... spin_lock(&mm->page_table_lock) ... page_add_new_anon_rmap // Here we increment the page's map count (starts at -1). atomic_set(&page->_mapcount, 0) set_pmd_at // Here we set the page's PMD entry which will be cleared // when Thread A calls pmd_clear_bad(). ... spin_unlock(&mm->page_table_lock) The mmap_sem does not prevent the race because both threads are acquiring it in shared mode (down_read). Thread B holds the page_table_lock while the page's map count and PMD table entry are updated. However, Thread A does not synchronize on that lock. ====== end quote ======= [akpm@linux-foundation.org: checkpatch fixes] Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Jones <davej@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [2.6.38+] Cc: Mark Salter <msalter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-13x86: Rename trap_no to trap_nr in thread_structSrikar Dronamraju
There are precedences of trap number being referred to as trap_nr. However thread struct refers trap number as trap_no. Change it to trap_nr. Also use enum instead of left-over literals for trap values. This is pure cleanup, no functional change intended. Suggested-by: Ingo Molnar <mingo@eltu.hu> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120312092555.5379.942.sendpatchset@srdronam.in.ibm.com [ Fixed the math-emu build ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-17x86-32: Fix build failure with AUDIT=y, AUDITSYSCALL=nAl Viro
JONGMAN HEO reports: With current linus git (commit a25a2b84), I got following build error, arch/x86/kernel/vm86_32.c: In function 'do_sys_vm86': arch/x86/kernel/vm86_32.c:340: error: implicit declaration of function '__audit_syscall_exit' make[3]: *** [arch/x86/kernel/vm86_32.o] Error 1 OK, I can reproduce it (32bit allmodconfig with AUDIT=y, AUDITSYSCALL=n) It's due to commit d7e7528bcd45: "Audit: push audit success and retcode into arch ptrace.h". Reported-by: JONGMAN HEO <jongman.heo@samsung.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-17Audit: push audit success and retcode into arch ptrace.hEric Paris
The audit system previously expected arches calling to audit_syscall_exit to supply as arguments if the syscall was a success and what the return code was. Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things by converting from negative retcodes to an audit internal magic value stating success or failure. This helper was wrong and could indicate that a valid pointer returned to userspace was a failed syscall. The fix is to fix the layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it in turns calls back into arch code to collect the return value and to determine if the syscall was a success or failure. We also define a generic is_syscall_success() macro which determines success/failure based on if the value is < -MAX_ERRNO. This works for arches like x86 which do not use a separate mechanism to indicate syscall failure. We make both the is_syscall_success() and regs_return_value() static inlines instead of macros. The reason is because the audit function must take a void* for the regs. (uml calls theirs struct uml_pt_regs instead of just struct pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit function takes a void* we need to use static inlines to cast it back to the arch correct structure to dereference it. The other major change is that on some arches, like ia64, MIPS and ppc, we change regs_return_value() to give us the negative value on syscall failure. THE only other user of this macro, kretprobe_example.c, won't notice and it makes the value signed consistently for the audit functions across all archs. In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old audit code as the return value. But the ptrace_64.h code defined the macro regs_return_value() as regs[3]. I have no idea which one is correct, but this patch now uses the regs_return_value() function, so it now uses regs[3]. For powerpc we previously used regs->result but now use the regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is always positive so the regs_return_value(), much like ia64 makes it negative before calling the audit code when appropriate. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion] Acked-by: Tony Luck <tony.luck@intel.com> [for ia64] Acked-by: Richard Weinberger <richard@nod.at> [for uml] Acked-by: David S. Miller <davem@davemloft.net> [for sparc] Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips] Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]
2011-01-13thp: split_huge_page_mm/vmaAndrea Arcangeli
split_huge_page_pmd compat code. Each one of those would need to be expanded to hundred of lines of complex code without a fully reliable split_huge_page_pmd design. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-23x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.Bart Oldeman
Impact: fix kernel bug such as: BUG: scheduling while atomic: dosemu.bin/19680/0x00000004 See also Ubuntu bug 455067 at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/455067 Commits 4915a35e35a037254550a2ba9f367a812bc37d40 ("Use preempt_conditional_sti/cli in do_int3, like on x86_64.") and 3d2a71a596bd9c761c8487a2178e95f8a61da083 ("x86, traps: converge do_debug handlers") started disabling preemption in int1 and int3 handlers on i386. The problem with vm86 is that the call to handle_vm86_trap() may jump straight to entry_32.S and never returns so preempt is never enabled again, and there is an imbalance in the preempt count. Commit be716615fe596ee117292dc615e95f707fb67fd1 ("x86, vm86: fix preemption bug"), which was later (accidentally?) reverted by commit 08d68323d1f0c34452e614263b212ca556dae47f ("hw-breakpoints: modifying generic debug exception to use thread-specific debug registers") fixed the problem for debug exceptions but not for breakpoints. There are three solutions to this problem. 1. Reenable preemption before calling handle_vm86_trap(). This was the approach that was later reverted. 2. Do not disable preemption for i386 in breakpoint and debug handlers. This was the situation before October 2008. As far as I understand preemption only needs to be disabled on x86_64 because a seperate stack is used, but it's nice to have things work the same way on i386 and x86_64. 3. Let handle_vm86_trap() return instead of jumping to assembly code. By setting a flag in _TIF_WORK_MASK, either TIF_IRET or TIF_NOTIFY_RESUME, the code in entry_32.S is instructed to return to 32 bit mode from V86 mode. The logic in entry_32.S was already present to handle signals. (I chose TIF_IRET because it's slightly more efficient in do_notify_resume() in signal.c, but in fact TIF_IRET can probably be replaced by TIF_NOTIFY_RESUME everywhere.) I'm submitting approach 3, because I believe it is the most elegant and prevents future confusion. Still, an obvious preempt_conditional_cli(regs); is necessary in traps.c to correct the bug. [ hpa: This is technically a regression, but because: 1. the regression is so old, 2. the patch seems relatively high risk, justifying more testing, and 3. we're late in the 2.6.36-rc cycle, I'm queuing it up for the 2.6.37 merge window. It might, however, justify as a -stable backport at a latter time, hence Cc: stable. ] Signed-off-by: Bart Oldeman <bartoldeman@users.sourceforge.net> LKML-Reference: <alpine.DEB.2.00.1009231312330.4732@localhost.localdomain> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: <stable@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-12-09x86, 32-bit: Convert sys_vm86 & sys_vm86oldBrian Gerst
Convert these to new PTREGSCALL stubs. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-6-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-10Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Clear TS in irq_ts_save() when in an atomic section x86: Detect use of extended APIC ID for AMD CPUs x86: memtest: remove 64-bit division x86, UV: Fix macros for multiple coherency domains x86: Fix non-lazy GS handling in sys_vm86() x86: Add quirk for reboot stalls on a Dell Optiplex 360 x86: Fix UV BAU activation descriptor init
2009-06-07x86: Fix non-lazy GS handling in sys_vm86()Lubomir Rintel
This fixes a stack corruption panic or null dereference oops due to a bad GS in resume_userspace() when returning from sys_vm86() and calling lockdep_sys_exit(). Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR enabled. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1244384628.2323.4.camel@bimbo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-06x86: use symbolic name for VM86_SIGNAL when used as vm86 default returnSamuel Bronson
This code has apparently used "0" and not VM86_SIGNAL since Linux 1.1.9, when Linus added VM86_SIGNAL to vm86.h. This patch changes the code to use the symbolic name. The magic 0 tripped me up in trying to extend the vm86(2) manpage to actually explain vm86()'s interface -- my greps for VM86_SIGNAL came up fruitless. [ Impact: cleanup; no object code change ] Signed-off-by: Samuel Bronson <naesten@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-11x86: use regparm(3) for passed-in pt_regs pointerBrian Gerst
Some syscalls need to access the pt_regs structure, either to copy user register state or to modifiy it. This patch adds stubs to load the address of the pt_regs struct into the %eax register, and changes the syscalls to take the pointer as an argument instead of relying on the assumption that the pt_regs structure overlaps the function arguments. Drop the use of regparm(1) due to concern about gcc bugs, and to move in the direction of the eventual removal of regparm(0) for asmlinkage. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-11x86: pass in pt_regs pointer for syscalls that need itBrian Gerst
Some syscalls need to access the pt_regs structure, either to copy user register state or to modifiy it. This patch adds stubs to load the address of the pt_regs struct into the %eax register, and changes the syscalls to regparm(1) to receive the pt_regs pointer as the first argument. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-10x86: add %gs accessors for x86_32Tejun Heo
Impact: cleanup On x86_32, %gs is handled lazily. It's not saved and restored on kernel entry/exit but only when necessary which usually is during task switch but there are few other places. Currently, it's done by calling savesegment() and loadsegment() explicitly. Define get_user_gs(), set_user_gs() and task_user_gs() and use them instead. While at it, clean up register access macros in signal.c. This cleans up code a bit and will help future changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-22x86: Introducing asm/syscalls.hJaswinder Singh
Declaring arch-dependent syscalls for x86 architecture Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
2008-04-17x86: replace most VM86 flags with flags from processor-flags.hgorcunov@gmail.com
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: cleanup - rename VM_MASK to X86_VM_MASKgorcunov@gmail.com
This patch renames VM_MASK to X86_VM_MASK (which in turn defined as alias to X86_EFLAGS_VM) to better distinguish from virtual memory flags. We can't just use X86_EFLAGS_VM instead because it is also used for conditional compilation Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: handle_vm86_trap cleanupRoland McGrath
Use force_sig in handle_vm86_trap like other machine traps do. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: coding style fixes to arch/x86/kernel/vm86_32.cPaolo Ciarrocchi
Before: total: 64 errors, 18 warnings, 840 lines checked After: total: 12 errors, 15 warnings, 844 lines checked No code changed: arch/x86/kernel/vm86_32.o: text data bss dec hex filename 4449 28 132 4609 1201 vm86_32.o.before 4449 28 132 4609 1201 vm86_32.o.after md5: e4e51ed7689d17f04148554a3c6d5bb6 vm86_32.o.before.asm e4e51ed7689d17f04148554a3c6d5bb6 vm86_32.o.after.asm Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30arch/x86/: spelling fixesJoe Perches
Spelling fixes. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: remove all definitions with fastcallHarvey Harrison
fastcall is always defined to be empty, remove it from arch/x86 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: remove last users of FASTCALLHarvey Harrison
FASTCALL() is always empty. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: use generic register name in the thread and tss structuresH. Peter Anvin
This changes size-specific register names (eip/rip, esp/rsp, etc.) to generic names in the thread and tss structures. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: rename the struct pt_regs members for 32/64-bit consistencyH. Peter Anvin
We have a lot of code which differs only by the naming of specific members of structures that contain registers. In order to enable additional unifications, this patch drops the e- or r- size prefix from the register names in struct pt_regs, and drops the x- prefixes for segment registers on the 32-bit side. This patch also performs the equivalent renames in some additional places that might be candidates for unification in the future. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-13Delete filenames in comments.Dave Jones
Since the x86 merge, lots of files that referenced their own filenames are no longer correct. Rather than keep them up to date, just delete them, as they add no real value. Additionally: - fix up comment formatting in scx200_32.c - Remove a credit from myself in setup_64.c from a time when we had no SCM - remove longwinded history from tsc_32.c which can be figured out from git. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-11i386: move kernelThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>