mm, locking: Rework {set,clear,mm}_tlb_flush_pending()

Commit: af2c1401e6f9 ("mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates") added smp_mb__before_spinlock() to set_tlb_flush_pending(). I think we can solve the same problem without this barrier. If instead we mandate that mm_tlb_flush_pending() is used while holding the PTL we're guaranteed to observe prior set_tlb_flush_pending() instances. For this to work we need to rework migrate_misplaced_transhuge_page() a little and move the test up into do_huge_pmd_numa_page(). NOTE: this relies on flush_tlb_range() to guarantee: (1) it ensures that prior page table updates are visible to the page table walker and (2) it ensures that subsequent memory accesses are only made visible after the invalidation has completed This is required for architectures that implement TRANSPARENT_HUGEPAGE (arc, arm, arm64, mips, powerpc, s390, sparc, x86) or otherwise use mm_tlb_flush_pending() in their page-table operations (arm, arm64, x86). This appears true for: - arm (DSB ISB before and after), - arm64 (DSB ISHST before, and DSB ISH after), - powerpc (PTESYNC before and after), - s390 and x86 TLB invalidate are serializing instructions But I failed to understand the situation for: - arc, mips, sparc Now SPARC64 is a wee bit special in that flush_tlb_range() is a no-op and it flushes the TLBs using arch_{enter,leave}_lazy_mmu_mode() inside the PTL. It still needs to guarantee the PTL unlock happens _after_ the invalidate completes. Vineet, Ralf and Dave could you guys please have a look? Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Peter Zijlstra <peterz@infradead.org> 2017-06-07 18:05:07 +0200
committer: Ingo Molnar <mingo@kernel.org> 2017-08-10 12:29:01 +0200
commit: 8b1b436dd1ccc8a1770af6e56eec047ad4920659 (patch)
tree: c88cd33c0241d73b18a4e41d813584ca4a7da971 /mm/huge_memory.c
parent: 706eeb3e9c6f032f2d22a1c658624cfb6ace61d4 (diff)
1 files changed, 20 insertions, 0 deletions
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86975dec0ba1..c76a720b936b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1410,6 +1410,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	int page_nid = -1, this_nid = numa_node_id();
 	int target_nid, last_cpupid = -1;
+	bool need_flush = false;
 	bool page_locked;
 	bool migrated = false;
 	bool was_writable;
@@ -1496,10 +1497,29 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 	}
 
 	/*
+	 * Since we took the NUMA fault, we must have observed the !accessible
+	 * bit. Make sure all other CPUs agree with that, to avoid them
+	 * modifying the page we're about to migrate.
+	 *
+	 * Must be done under PTL such that we'll observe the relevant
+	 * set_tlb_flush_pending().
+	 */
+	if (mm_tlb_flush_pending(vma->vm_mm))
+		need_flush = true;
+
+	/*
 	 * Migrate the THP to the requested node, returns with page unlocked
 	 * and access rights restored.
 	 */
 	spin_unlock(vmf->ptl);
+
+	/*
+	 * We are not sure a pending tlb flush here is for a huge page
+	 * mapping or not. Hence use the tlb range variant
+	 */
+	if (need_flush)
+		flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
+
 	migrated = migrate_misplaced_transhuge_page(vma->vm_mm, vma,
 				vmf->pmd, pmd, vmf->address, page, target_nid);
 	if (migrated) {
author	Peter Zijlstra <peterz@infradead.org>	2017-06-07 18:05:07 +0200
committer	Ingo Molnar <mingo@kernel.org>	2017-08-10 12:29:01 +0200
commit	8b1b436dd1ccc8a1770af6e56eec047ad4920659 (patch)
tree	c88cd33c0241d73b18a4e41d813584ca4a7da971 /mm/huge_memory.c
parent	706eeb3e9c6f032f2d22a1c658624cfb6ace61d4 (diff)