summaryrefslogtreecommitdiff
path: root/arch/arm64/mm/mmu.c
AgeCommit message (Collapse)Author
2020-01-09mm/memory_hotplug: shrink zones when offlining memoryDavid Hildenbrand
commit feee6b2989165631b17ac6d4ccdbf6759254e85a upstream. We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): BUG: unable to handle page fault for address: 000000000000353d #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:clear_zone_contiguous+0x5/0x10 Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __remove_pages+0x4b/0x640 arch_remove_memory+0x63/0x8d try_remove_memory+0xdb/0x130 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x70/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x227/0x3a0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x221/0x550 worker_thread+0x50/0x3b0 kthread+0x105/0x140 ret_from_fork+0x3a/0x50 Modules linked in: CR2: 000000000000353d Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-26mm: treewide: clarify pgtable_page_{ctor,dtor}() namingMark Rutland
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few people, and until recently arm64 used these erroneously/pointlessly for other levels of page table. To make it incredibly clear that these only apply to the PTE level, and to align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them to pgtable_pte_page_{ctor,dtor}(). These changes were generated with the following shell script: ---- git grep -lw 'pgtable_page_.tor' | while read FILE; do sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE; sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE; done ---- ... with the documentation re-flowed to remain under 80 columns, and whitespace fixed up in macros to keep backslashes aligned. There should be no functional change as a result of this patch. Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30Merge branches 'for-next/52-bit-kva', 'for-next/cpu-topology', ↵Will Deacon
'for-next/error-injection', 'for-next/perf', 'for-next/psci-cpuidle', 'for-next/rng', 'for-next/smpboot', 'for-next/tbi' and 'for-next/tlbi' into for-next/core * for-next/52-bit-kva: (25 commits) Support for 52-bit virtual addressing in kernel space * for-next/cpu-topology: (9 commits) Move CPU topology parsing into core code and add support for ACPI 6.3 * for-next/error-injection: (2 commits) Support for function error injection via kprobes * for-next/perf: (8 commits) Support for i.MX8 DDR PMU and proper SMMUv3 group validation * for-next/psci-cpuidle: (7 commits) Move PSCI idle code into a new CPUidle driver * for-next/rng: (4 commits) Support for 'rng-seed' property being passed in the devicetree * for-next/smpboot: (3 commits) Reduce fragility of secondary CPU bringup in debug configurations * for-next/tbi: (10 commits) Introduce new syscall ABI with relaxed requirements for pointer tags * for-next/tlbi: (6 commits) Handle spurious page faults arising from kernel space
2019-08-28arm64: fix fixmap copy for 16K pages and 48-bit VAMark Rutland
With 16K pages and 48-bit VAs, the PGD level of table has two entries, and so the fixmap shares a PGD with the kernel image. Since commit: f9040773b7bbbd9e ("arm64: move kernel image to base of vmalloc area") ... we copy the existing fixmap to the new fine-grained page tables at the PUD level in this case. When walking to the new PUD, we forgot to offset the PGD entry and always used the PGD entry at index 0, but this worked as the kernel image and fixmap were in the low half of the TTBR1 address space. As of commit: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space") ... the kernel image and fixmap are in the high half of the TTBR1 address space, and hence use the PGD at index 1, but we didn't update the fixmap copying code to account for this. Thus, we'll erroneously try to copy the fixmap slots into a PUD under the PGD entry at index 0. At the point we do so this PGD entry has not been initialised, and thus we'll try to write a value to a small offset from physical address 0, causing a number of potential problems. Fix this be correctly offsetting the PGD. This is split over a few steps for legibility. Fixes: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space") Reported-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Steve Capper <Steve.Capper@arm.com> Tested-by: Steve Capper <Steve.Capper@arm.com> Tested-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-23arm64: map FDT as RW for early_init_dt_scan()Hsin-Yi Wang
Currently in arm64, FDT is mapped to RO before it's passed to early_init_dt_scan(). However, there might be some codes (eg. commit "fdt: add support for rng-seed") that need to modify FDT during init. Map FDT to RO after early fixups are done. Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-14arm64: memory: rename VA_START to PAGE_ENDMark Rutland
Prior to commit: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space") ... VA_START described the start of the TTBR1 address space for a given VA size described by VA_BITS, where all kernel mappings began. Since that commit, VA_START described a portion midway through the address space, where the linear map ends and other kernel mappings begin. To avoid confusion, let's rename VA_START to PAGE_END, making it clear that it's not the start of the TTBR1 address space and implying that it's related to PAGE_OFFSET. Comments and other mnemonics are updated accordingly, along with a typo fix in the decription of VMEMMAP_SIZE. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-09arm64: mm: Remove vabits_userSteve Capper
Previous patches have enabled 52-bit kernel + user VAs and there is no longer any scenario where user VA != kernel VA size. This patch removes the, now redundant, vabits_user variable and replaces usage with vabits_actual where appropriate. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-09arm64: mm: Introduce vabits_actualSteve Capper
In order to support 52-bit kernel addresses detectable at boot time, one needs to know the actual VA_BITS detected. A new variable vabits_actual is introduced in this commit and employed for the KVM hypervisor layout, KASAN, fault handling and phys-to/from-virt translation where there would normally be compile time constants. In order to maintain performance in phys_to_virt, another variable physvirt_offset is introduced. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-09arm64: mm: Flip kernel VA spaceSteve Capper
In order to allow for a KASAN shadow that changes size at boot time, one must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the start address. Also, it is highly desirable to maintain the same function addresses in the kernel .text between VA sizes. Both of these requirements necessitate us to flip the kernel address space halves s.t. the direct linear map occupies the lower addresses. This patch puts the direct linear map in the lower addresses of the kernel VA range and everything else in the higher ranges. We need to adjust: *) KASAN shadow region placement logic, *) KASAN_SHADOW_OFFSET computation logic, *) virt_to_phys, phys_to_virt checks, *) page table dumper. These are all small changes, that need to take place atomically, so they are bundled into this commit. As part of the re-arrangement, a guard region of 2MB (to preserve alignment for fixed map) is added after the vmemmap. Otherwise the vmemmap could intersect with IS_ERR pointers. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-07-18mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVEDavid Hildenbrand
We want to improve error handling while adding memory by allowing to use arch_remove_memory() and __remove_pages() even if CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like: arch_add_memory() rc = do_something(); if (rc) { arch_remove_memory(); } We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require quite some dependencies for memory offlining. Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Rob Herring <robh@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Qian Cai <cai@lca.pw> Cc: Mathieu Malaterre <malat@debian.org> Cc: Baoquan He <bhe@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18arm64/mm: add temporary arch_remove_memory() implementationDavid Hildenbrand
A proper arch_remove_memory() implementation is on its way, which also cleanly removes page tables in arch_add_memory() in case something goes wrong. As we want to use arch_remove_memory() in case something goes wrong during memory hotplug after arch_add_memory() finished, let's add a temporary hack that is sufficient enough until we get a proper implementation that cleans up page table entries. We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up patches. Link: http://lkml.kernel.org/r/20190527111152.16324-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arun KS <arunks@codeaurora.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Brown <broonie@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16mm/ioremap: probe platform for p4d huge map supportAnshuman Khandual
Finish up what commit c2febafc6773 ("mm: convert generic code to 5-level paging") started while levelling up P4D huge mapping support at par with PUD and PMD. A new arch call back arch_ioremap_p4d_supported() is added which just maintains status quo (P4D huge map not supported) on x86, arm64 and powerpc. When HAVE_ARCH_HUGE_VMAP is enabled its just a simple check from the arch about the support, hence runtime effects are minimal. Link: http://lkml.kernel.org/r/1561699231-20991-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12arm64: switch to generic version of pte allocationMike Rapoport
The PTE allocations in arm64 are identical to the generic ones modulo the GFP flags. Using the generic pte_alloc_one() functions ensures that the user page tables are allocated with __GFP_ACCOUNT set. The arm64 definition of PGALLOC_GFP is removed and replaced with GFP_PGTABLE_USER for p[gum]d_alloc_one() for the user page tables and GFP_PGTABLE_KERNEL for the kernel page tables. The KVM memory cache is now using GFP_PGTABLE_USER. The mappings created with create_pgd_mapping() are now using GFP_PGTABLE_KERNEL. The conversion to the generic version of pte_free_kernel() removes the NULL check for pte. The pte_free() version on arm64 is identical to the generic one and can be simply dropped. [cai@lca.pw: fix a bogus GFP flag in pgd_alloc()] Link: https://lore.kernel.org/r/1559656836-24940-1-git-send-email-cai@lca.pw/ [and fix it more] Link: https://lore.kernel.org/linux-mm/20190617151252.GF16810@rapoport-lnx/ Link: http://lkml.kernel.org/r/1557296232-15361-5-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> Cc: Helge Deller <deller@gmx.de> Cc: Ley Foon Tan <lftan@altera.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-08Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP} - Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to manage the permissions of executable vmalloc regions more strictly - Slight performance improvement by keeping softirqs enabled while touching the FPSIMD/SVE state (kernel_neon_begin/end) - Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG and AXFLAG instructions for floating point comparison flags manipulation) and FRINT (rounding floating point numbers to integers) - Re-instate ARM64_PSEUDO_NMI support which was previously marked as BROKEN due to some bugs (now fixed) - Improve parking of stopped CPUs and implement an arm64-specific panic_smp_self_stop() to avoid warning on not being able to stop secondary CPUs during panic - perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI platforms - perf: DDR performance monitor support for iMX8QXP - cache_line_size() can now be set from DT or ACPI/PPTT if provided to cope with a system cache info not exposed via the CPUID registers - Avoid warning on hardware cache line size greater than ARCH_DMA_MINALIGN if the system is fully coherent - arm64 do_page_fault() and hugetlb cleanups - Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep) - Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags' introduced in 5.1) - CONFIG_RANDOMIZE_BASE now enabled in defconfig - Allow the selection of ARM64_MODULE_PLTS, currently only done via RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill over into the vmalloc area - Make ZONE_DMA32 configurable * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits) perf: arm_spe: Enable ACPI/Platform automatic module loading arm_pmu: acpi: spe: Add initial MADT/SPE probing ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens ACPI/PPTT: Modify node flag detection to find last IDENTICAL x86/entry: Simplify _TIF_SYSCALL_EMU handling arm64: rename dump_instr as dump_kernel_instr arm64/mm: Drop [PTE|PMD]_TYPE_FAULT arm64: Implement panic_smp_self_stop() arm64: Improve parking of stopped CPUs arm64: Expose FRINT capabilities to userspace arm64: Expose ARMv8.5 CondM capability to userspace arm64: defconfig: enable CONFIG_RANDOMIZE_BASE arm64: ARM64_MODULES_PLTS must depend on MODULES arm64: bpf: do not allocate executable memory arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP arm64: module: create module allocations without exec permissions arm64: Allow user selection of ARM64_MODULE_PLTS acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 arm64: Allow selecting Pseudo-NMI again ...
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-07arm64: Fix comment after #endifOdin Ugedal
The config value used in the if was changed in b433dce056d3814dc4b33e5a8a533d6401ffcfb0, but the comment on the corresponding end was not changed. Signed-off-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-06-04arm64/mm: Change BUG_ON() to VM_BUG_ON() in [pmd|pud]_set_huge()Anshuman Khandual
There are no callers for the functions which will pass unaligned physical addresses. Hence just change these BUG_ON() checks into VM_BUG_ON() which gets compiled out unless CONFIG_VM_DEBUG is enabled. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-06-04arm64/mm: Simplify protection flag creation for kernel huge mappingsAnshuman Khandual
Even though they have got the same value, PMD_TYPE_SECT and PUD_TYPE_SECT get used for kernel huge mappings. But before that first the table bit gets cleared using leaf level PTE_TABLE_BIT. Though functionally they are same, we should use page table level specific macros to be consistent as per the MMU specifications. Create page table level specific wrappers for kernel huge mapping entries and just drop mk_sect_prot() which does not have any other user. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-05-22Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix SPE probe failure when backing auxbuf with high-order pages - Fix handling of DMA allocations from outside of the vmalloc area - Fix generation of build-id ELF section for vDSO object - Disable huge I/O mappings if kernel page table dumping is enabled - A few other minor fixes (comments, kconfig etc) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: vdso: Explicitly add build-id option arm64/mm: Inhibit huge-vmap with ptdump arm64: Print physical address of page table base in show_pte() arm64: don't trash config with compat symbol if COMPAT is disabled arm64: assembler: Update comment above cond_yield_neon() macro drivers/perf: arm_spe: Don't error on high-order pages for aux buf arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable
2019-05-16arm64/mm: Inhibit huge-vmap with ptdumpMark Rutland
The arm64 ptdump code can race with concurrent modification of the kernel page tables. At the time this was added, this was sound as: * Modifications to leaf entries could result in stale information being logged, but would not result in a functional problem. * Boot time modifications to non-leaf entries (e.g. freeing of initmem) were performed when the ptdump code cannot be invoked. * At runtime, modifications to non-leaf entries only occurred in the vmalloc region, and these were strictly additive, as intermediate entries were never freed. However, since commit: commit 324420bf91f6 ("arm64: add support for ioremap() block mappings") ... it has been possible to create huge mappings in the vmalloc area at runtime, and as part of this existing intermediate levels of table my be removed and freed. It's possible for the ptdump code to race with this, and continue to walk tables which have been freed (and potentially poisoned or reallocated). As a result of this, the ptdump code may dereference bogus addresses, which could be fatal. Since huge-vmap is a TLB and memory optimization, we can disable it when the runtime ptdump code is in use to avoid this problem. Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: 324420bf91f60582 ("arm64: add support for ioremap() block mappings") Acked-by: Ard Biesheuvel <ard.biesheuvel@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-14treewide: replace #include <asm/sizes.h> with #include <linux/sizes.h>Masahiro Yamada
Since commit dccd2304cc90 ("ARM: 7430/1: sizes.h: move from asm-generic to <linux/sizes.h>"), <asm/sizes.h> and <asm-generic/sizes.h> are just wrappers of <linux/sizes.h>. This commit replaces all <asm/sizes.h> and <asm-generic/sizes.h> to prepare for the removal. Link: http://lkml.kernel.org/r/1553267665-27228-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm, memory_hotplug: provide a more generic restrictions for memory hotplugMichal Hocko
arch_add_memory, __add_pages take a want_memblock which controls whether the newly added memory should get the sysfs memblock user API (e.g. ZONE_DEVICE users do not want/need this interface). Some callers even want to control where do we allocate the memmap from by configuring altmap. Add a more generic hotplug context for arch_add_memory and __add_pages. struct mhp_restrictions contains flags which contains additional features to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and altmap for alternative memmap allocator. This patch shouldn't introduce any functional change. [akpm@linux-foundation.org: build fix] Link: http://lkml.kernel.org/r/20190408082633.2864-3-osalvador@suse.de Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-09arm64: mm: Consolidate early page table allocationWill Deacon
The logic for early allocation of page tables is duplicated between pgd_kernel_pgtable_alloc() and pgd_pgtable_alloc(). Drop the duplication by calling one from the other and renaming pgd_kernel_pgtable_alloc() accordingly. Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-09arm64: mm: don't call page table ctors for init_mmYu Zhao
init_mm doesn't require page table lock to be initialized at any level. Add a separate page table allocator for it, and the new one skips page table ctors. The ctors allocate memory when ALLOC_SPLIT_PTLOCKS is set. Not calling them avoids memory leak in case we call pte_free_kernel() on init_mm. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-09arm64: mm: use appropriate ctors for page tablesYu Zhao
For pte page, use pgtable_page_ctor(); for pmd page, use pgtable_pmd_page_ctor(); and for the rest (pud, p4d and pgd), don't use any. For now, we don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK and pgtable_pmd_page_ctor() is a nop. When we do in patch 3, we make sure pmd is not folded so we won't mistakenly call pgtable_pmd_page_ctor() on pud or p4d. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-03-12memblock: memblock_phys_alloc(): don't panicMike Rapoport
Make the memblock_phys_alloc() function an inline wrapper for memblock_phys_alloc_range() and update the memblock_phys_alloc() callers to check the returned value and panic in case of error. Link: http://lkml.kernel.org/r/1548057848-15136-8-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-01arm64: mmu: drop paging_init commentsPeng Fan
The comments could not reflect the code, and it is easy to get what this function does from a straight-line reading of the code. So let's drop the comments Signed-off-by: Peng Fan <peng.fan@nxp.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-01-21arm64: Remove asm/memblock.hWill Deacon
The arm64 asm/memblock.h header exists only to provide a function prototype for arm64_memblock_init(), which is called only from setup_arch(). Move the declaration into mmu.h, where it can live alongside other init functions such as paging_init() and bootmem_init() without the need for its own special header file. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-12-28lib/ioremap: ensure break-before-make is used for huge p4d mappingsWill Deacon
Whilst no architectures actually enable support for huge p4d mappings in the vmap area, the code that is implemented should be using break-before-make, as we do for pud and pmd huge entries. Link: http://lkml.kernel.org/r/1544120495-17438-6-git-send-email-will.deacon@arm.com Signed-off-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28arm64: mmu: drop pXd_present() checks from pXd_free_pYd_table()Will Deacon
The core code already has a check for pXd_none(), so remove it from the architecture implementation. Link: http://lkml.kernel.org/r/1544120495-17438-3-git-send-email-will.deacon@arm.com Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-12arm64: Add memory hotplug supportRobin Murphy
Wire up the basic support for hot-adding memory. Since memory hotplug is fairly tightly coupled to sparsemem, we tweak pfn_valid() to also cross-check the presence of a section in the manner of the generic implementation, before falling back to memblock to check for no-map regions within a present section as before. By having arch_add_memory(() create the linear mapping first, this then makes everything work in the way that __add_section() expects. We expect hotplug to be ACPI-driven, so the swapper_pg_dir updates should be safe from races by virtue of the global device hotplug lock. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10arm64: mm: EXPORT vabits_user to modulesWill Deacon
TASK_SIZE is defined using the vabits_user variable for 64-bit tasks, so ensure that this variable is exported to modules to avoid the following build breakage with allmodconfig: | ERROR: "vabits_user" [lib/test_user_copy.ko] undefined! | ERROR: "vabits_user" [drivers/misc/lkdtm/lkdtm.ko] undefined! | ERROR: "vabits_user" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10arm64: mm: introduce 52-bit userspace supportSteve Capper
On arm64 there is optional support for a 52-bit virtual address space. To exploit this one has to be running with a 64KB page size and be running on hardware that supports this. For an arm64 kernel supporting a 48 bit VA with a 64KB page size, some changes are needed to support a 52-bit userspace: * TCR_EL1.T0SZ needs to be 12 instead of 16, * TASK_SIZE needs to reflect the new size. This patch implements the above when the support for 52-bit VAs is detected at early boot time. On arm64 userspace addresses translation is controlled by TTBR0_EL1. As well as userspace, TTBR0_EL1 controls: * The identity mapping, * EFI runtime code. It is possible to run a kernel with an identity mapping that has a larger VA size than userspace (and for this case __cpu_set_tcr_t0sz() would set TCR_EL1.T0SZ as appropriate). However, when the conditions for 52-bit userspace are met; it is possible to keep TCR_EL1.T0SZ fixed at 12. Thus in this patch, the TCR_EL1.T0SZ size changing logic is disabled. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-20arm64: mm: apply r/o permissions of VM areas to its linear alias as wellArd Biesheuvel
On arm64, we use block mappings and contiguous hints to map the linear region, to minimize the TLB footprint. However, this means that the entire region is mapped using read/write permissions, which we cannot modify at page granularity without having to take intrusive measures to prevent TLB conflicts. This means the linear aliases of pages belonging to read-only mappings (executable or otherwise) in the vmalloc region are also mapped read/write, and could potentially be abused to modify things like module code, bpf JIT code or other read-only data. So let's fix this, by extending the set_memory_ro/rw routines to take the linear alias into account. The consequence of enabling this is that we can no longer use block mappings or contiguous hints, so in cases where the TLB footprint of the linear region is a bottleneck, performance may be affected. Therefore, allow this feature to be runtime en/disabled, by setting rodata=full (or 'on' to disable just this enhancement, or 'off' to disable read-only mappings for code and r/o data entirely) on the kernel command line. Also, allow the default value to be set via a Kconfig option. Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-08arm64: memblock: don't permit memblock resizing until linear mapping is upArd Biesheuvel
Bhupesh reports that having numerous memblock reservations at early boot may result in the following crash: Unable to handle kernel paging request at virtual address ffff80003ffe0000 ... Call trace: __memcpy+0x110/0x180 memblock_add_range+0x134/0x2e8 memblock_reserve+0x70/0xb8 memblock_alloc_base_nid+0x6c/0x88 __memblock_alloc_base+0x3c/0x4c memblock_alloc_base+0x28/0x4c memblock_alloc+0x2c/0x38 early_pgtable_alloc+0x20/0xb0 paging_init+0x28/0x7f8 This is caused by the fact that we permit memblock resizing before the linear mapping is up, and so the memblock_reserved() array is moved into memory that is not mapped yet. So let's ensure that this crash can no longer occur, by deferring to call to memblock_allow_resize() to after the linear mapping has been created. Reported-by: Bhupesh Sharma <bhsharma@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-10-31memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*Mike Rapoport
Make it explicit that the caller gets a physical address rather than a virtual one. This will also allow using meblock_alloc prefix for memblock allocations returning virtual address, which is done in the following patches. The conversion is done using the following semantic patch: @@ expression e1, e2, e3; @@ ( - memblock_alloc(e1, e2) + memblock_phys_alloc(e1, e2) | - memblock_alloc_nid(e1, e2, e3) + memblock_phys_alloc_nid(e1, e2, e3) | - memblock_alloc_try_nid(e1, e2, e3) + memblock_phys_alloc_try_nid(e1, e2, e3) ) Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-22Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Apart from some new arm64 features and clean-ups, this also contains the core mmu_gather changes for tracking the levels of the page table being cleared and a minor update to the generic compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ. Summary: - Core mmu_gather changes which allow tracking the levels of page-table being cleared together with the arm64 low-level flushing routines - Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to mitigate Spectre-v4 dynamically without trapping to EL3 firmware - Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack - Optimise emulation of MRS instructions to ID_* registers on ARMv8.4 - Support for Common Not Private (CnP) translations allowing threads of the same CPU to share the TLB entries - Accelerated crc32 routines - Move swapper_pg_dir to the rodata section - Trap WFI instruction executed in user space - ARM erratum 1188874 workaround (arch_timer) - Miscellaneous fixes and clean-ups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits) arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work arm64: cpufeature: Trap CTR_EL0 access only where it is necessary arm64: cpufeature: Fix handling of CTR_EL0.IDC field arm64: cpufeature: ctr: Fix cpu capability check for late CPUs Documentation/arm64: HugeTLB page implementation arm64: mm: Use __pa_symbol() for set_swapper_pgd() arm64: Add silicon-errata.txt entry for ARM erratum 1188873 Revert "arm64: uaccess: implement unsafe accessors" arm64: mm: Drop the unused cpu parameter MAINTAINERS: fix bad sdei paths arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c arm64: xen: Use existing helper to check interrupt status arm64: Use daifflag_restore after bp_hardening arm64: daifflags: Use irqflags functions for daifflags arm64: arch_timer: avoid unused function warning arm64: Trap WFI executed in userspace arm64: docs: Document SSBS HWCAP arm64: docs: Fix typos in ELF hwcaps arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe ...
2018-10-10arm64: mm: Use __pa_symbol() for set_swapper_pgd()James Morse
commit 2330b7ca78350efcb ("arm64/mm: use fixmap to modify swapper_pg_dir") modifies the swapper_pg_dir via the fixmap as the kernel page tables have been moved to a read-only part of the kernel mapping. Using __pa() to setup the fixmap causes CONFIG_DEBUG_VIRTUAL to fire, as this function is used on the kernel-image swapper address. The in_swapper_pgdir() test before each call of this function means set_swapper_pgd() will only ever be called when pgdp points somewhere in the kernel-image mapping of swapper_pd_dir. Use __pa_symbol(). Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Jun Yao <yaojun8558363@gmail.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-25arm64/mm: use fixmap to modify swapper_pg_dirJun Yao
Once swapper_pg_dir is in the rodata section, it will not be possible to modify it directly, but we will need to modify it in some cases. To enable this, we can use the fixmap when deliberately modifying swapper_pg_dir. As the pgd is only transiently mapped, this provides some resilience against illicit modification of the pgd, e.g. for Kernel Space Mirror Attack (KSMA). Signed-off-by: Jun Yao <yaojun8558363@gmail.com> Reviewed-by: James Morse <james.morse@arm.com> [Mark: simplify ifdeffery, commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-25arm64/mm: Separate boot-time page tables from swapper_pg_dirJun Yao
Since the address of swapper_pg_dir is fixed for a given kernel image, it is an attractive target for manipulation via an arbitrary write. To mitigate this we'd like to make it read-only by moving it into the rodata section. We require that swapper_pg_dir is at a fixed offset from tramp_pg_dir and reserved_ttbr0, so these will also need to move into rodata. However, swapper_pg_dir is allocated along with some transient page tables used for boot which we do not want to move into rodata. As a step towards this, this patch separates the boot-time page tables into a new init_pg_dir, and reduces swapper_pg_dir to the single page it needs to be. This allows us to retain the relationship between swapper_pg_dir, tramp_pg_dir, and swapper_pg_dir, while cleanly separating these from the boot-time page tables. The init_pg_dir holds all of the pgd/pud/pmd/pte levels needed during boot, and all of these levels will be freed when we switch to the swapper_pg_dir, which is initialized by the existing code in paging_init(). Since we start off on the init_pg_dir, we no longer need to allocate a transient page table in paging_init() in order to ensure that swapper_pg_dir isn't live while we initialize it. There should be no functional change as a result of this patch. Signed-off-by: Jun Yao <yaojun8558363@gmail.com> Reviewed-by: James Morse <james.morse@arm.com> [Mark: place init_pg_dir after BSS, fold mm changes, commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-06arm64: fix erroneous warnings in page freeing functionsMark Rutland
In pmd_free_pte_page() and pud_free_pmd_page() we try to warn if they hit a present non-table entry. In both cases we'll warn for non-present entries, as the VM_WARN_ON() only checks the entry is not a table entry. This has been observed to result in warnings when booting a v4.19-rc2 kernel under qemu. Fix this by bailing out earlier for non-present entries. Fixes: ec28bb9c9b0826d7 ("arm64: Implement page table free interfaces") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-06arm64: Implement page table free interfacesChintan Pandya
arm64 requires break-before-make. Originally, before setting up new pmd/pud entry for huge mapping, in few cases, the modifying pmd/pud entry was still valid and pointing to next level page table as we only clear off leaf PTE in unmap leg. a) This was resulting into stale entry in TLBs (as few TLBs also cache intermediate mapping for performance reasons) b) Also, modifying pmd/pud was the only reference to next level page table and it was getting lost without freeing it. So, page leaks were happening. Implement pud_free_pmd_page() and pmd_free_pte_page() to enforce BBM and also free the leaking page tables. Implementation requires, 1) Clearing off the current pud/pmd entry 2) Invalidation of TLB 3) Freeing of the un-used next level page tables Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Chintan Pandya <cpandya@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-04ioremap: Update pgtable free interfaces with addrChintan Pandya
The following kernel panic was observed on ARM64 platform due to a stale TLB entry. 1. ioremap with 4K size, a valid pte page table is set. 2. iounmap it, its pte entry is set to 0. 3. ioremap the same address with 2M size, update its pmd entry with a new value. 4. CPU may hit an exception because the old pmd entry is still in TLB, which leads to a kernel panic. Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table") has addressed this panic by falling to pte mappings in the above case on ARM64. To support pmd mappings in all cases, TLB purge needs to be performed in this case on ARM64. Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page() so that TLB purge can be added later in seprate patches. [toshi.kani@hpe.com: merge changes, rewrite patch description] Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Chintan Pandya <cpandya@codeaurora.org> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: mhocko@suse.com Cc: akpm@linux-foundation.org Cc: hpa@zytor.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: Will Deacon <will.deacon@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com
2018-05-24arm64: Make sure permission updates happen for pmd/pudLaura Abbott
Commit 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings") disallowed block mappings for ioremap since that code does not honor break-before-make. The same APIs are also used for permission updating though and the extra checks prevent the permission updates from happening, even though this should be permitted. This results in read-only permissions not being fully applied. Visibly, this can occasionaly be seen as a failure on the built in rodata test when the test data ends up in a section or as an odd RW gap on the page table dump. Fix this by using pgattr_change_is_safe instead of p*d_present for determining if the change is permitted. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Peter Robinson <pbrobinson@gmail.com> Reported-by: Peter Robinson <pbrobinson@gmail.com> Fixes: 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings") Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-22mm/vmalloc: add interfaces to free unmapped page tableToshi Kani
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may create pud/pmd mappings. A kernel panic was observed on arm64 systems with Cortex-A75 in the following steps as described by Hanjun Guo. 1. ioremap a 4K size, valid page table will build, 2. iounmap it, pte0 will set to 0; 3. ioremap the same address with 2M size, pgd/pmd is unchanged, then set the a new value for pmd; 4. pte0 is leaked; 5. CPU may meet exception because the old pmd is still in TLB, which will lead to kernel panic. This panic is not reproducible on x86. INVLPG, called from iounmap, purges all levels of entries associated with purged address on x86. x86 still has memory leak. The patch changes the ioremap path to free unmapped page table(s) since doing so in the unmap path has the following issues: - The iounmap() path is shared with vunmap(). Since vmap() only supports pte mappings, making vunmap() to free a pte page is an overhead for regular vmap users as they do not need a pte page freed up. - Checking if all entries in a pte page are cleared in the unmap path is racy, and serializing this check is expensive. - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges. Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB purge. Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which clear a given pud/pmd entry and free up a page for the lower level entries. This patch implements their stub functions on x86 and arm64, which work as workaround. [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub] Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings") Reported-by: Lei Li <lious.lilei@hisilicon.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Wang Xuefeng <wxf.wang@hisilicon.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-26arm64: mm: fix thinko in non-global page table attribute checkArd Biesheuvel
The routine pgattr_change_is_safe() was extended in commit 4e6020565596 ("arm64: mm: Permit transitioning from Global to Non-Global without BBM") to permit changing the nG attribute from not set to set, but did so in a way that inadvertently disallows such changes if other permitted attribute changes take place at the same time. So update the code to take this into account. Fixes: 4e6020565596 ("arm64: mm: Permit transitioning from Global to ...") Cc: <stable@vger.kernel.org> # 4.14.x- Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-22arm64: Enforce BBM for huge IO/VMAP mappingsWill Deacon
ioremap_page_range doesn't honour break-before-make and attempts to put down huge mappings (using p*d_set_huge) over the top of pre-existing table entries. This leads to us leaking page table memory and also gives rise to TLB conflicts and spurious aborts, which have been seen in practice on Cortex-A75. Until this has been resolved, refuse to put block mappings when the existing entry is found to be present. Fixes: 324420bf91f60 ("arm64: add support for ioremap() block mappings") Reported-by: Hanjun Guo <hanjun.guo@linaro.org> Reported-by: Lei Li <lious.lilei@hisilicon.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-16arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tablesWill Deacon
In many cases, page tables can be accessed concurrently by either another CPU (due to things like fast gup) or by the hardware page table walker itself, which may set access/dirty bits. In such cases, it is important to use READ_ONCE/WRITE_ONCE when accessing page table entries so that entries cannot be torn, merged or subject to apparent loss of coherence due to compiler transformations. Whilst there are some scenarios where this cannot happen (e.g. pinned kernel mappings for the linear region), the overhead of using READ_ONCE /WRITE_ONCE everywhere is minimal and makes the code an awful lot easier to reason about. This patch consistently uses these macros in the arch code, as well as explicitly namespacing pointers to page table entries from the entries themselves by using adopting a 'p' suffix for the former (as is sometimes used elsewhere in the kernel source). Tested-by: Yury Norov <ynorov@caviumnetworks.com> Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-08Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Catalin Marinas: "As I mentioned in the last pull request, there's a second batch of security updates for arm64 with mitigations for Spectre/v1 and an improved one for Spectre/v2 (via a newly defined firmware interface API). Spectre v1 mitigation: - back-end version of array_index_mask_nospec() - masking of the syscall number to restrict speculation through the syscall table - masking of __user pointers prior to deference in uaccess routines Spectre v2 mitigation update: - using the new firmware SMC calling convention specification update - removing the current PSCI GET_VERSION firmware call mitigation as vendors are deploying new SMCCC-capable firmware - additional branch predictor hardening for synchronous exceptions and interrupts while in user mode Meltdown v3 mitigation update: - Cavium Thunder X is unaffected but a hardware erratum gets in the way. The kernel now starts with the page tables mapped as global and switches to non-global if kpti needs to be enabled. Other: - Theoretical trylock bug fixed" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits) arm64: Kill PSCI_GET_VERSION as a variant-2 workaround arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: smccc: Implement SMCCC v1.1 inline primitive arm/arm64: smccc: Make function identifiers an unsigned quantity firmware/psci: Expose SMCCC version through psci_ops firmware/psci: Expose PSCI conduit arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: KVM: Turn kvm_psci_version into a static inline arm/arm64: KVM: Advertise SMCCC v1.1 arm/arm64: KVM: Implement PSCI 1.0 support arm/arm64: KVM: Add smccc accessors to PSCI code arm/arm64: KVM: Add PSCI_VERSION helper arm/arm64: KVM: Consolidate the PSCI include files arm64: KVM: Increment PC after handling an SMC trap arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: entry: Apply BP hardening for suspicious interrupts from EL0 arm64: entry: Apply BP hardening for high-priority synchronous exceptions arm64: futex: Mask __user pointers prior to dereference ...
2018-02-06arm64: mm: Permit transitioning from Global to Non-Global without BBMWill Deacon
Break-before-make is not needed when transitioning from Global to Non-Global mappings, provided that the contiguous hint is not being used. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>