powerpc: Add 64bit optimised memcmp

I noticed ksm spending quite a lot of time in memcmp on a large KVM box. The current memcmp loop is very unoptimised - byte at a time compares with no loop unrolling. We can do much much better. Optimise the loop in a few ways: - Unroll the byte at a time loop - For large (at least 32 byte) comparisons that are also 8 byte aligned, use an unrolled modulo scheduled loop using 8 byte loads. This is similar to our glibc memcmp. A simple microbenchmark testing 10000000 iterations of an 8192 byte memcmp was used to measure the performance: baseline: 29.93 s modified: 1.70 s Just over 17x faster. v2: Incorporated some suggestions from Segher: - Use andi. instead of rdlicl. - Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare and was a relic from a previous version. - Don't use cr5, we have plans to use that CR field for fast local atomics. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
author: Anton Blanchard <anton@samba.org> 2015-01-21 12:27:38 +1100
committer: Michael Ellerman <mpe@ellerman.id.au> 2015-01-23 14:02:55 +1100
commit: 15c2d45d17418cc4a712608c78ff3b5f0583d83b (patch)
tree: 53e4ee00f5e0b604ee7451ee6e229751043ae0f6 /arch/powerpc/lib/string.S
parent: a113de373bcb7651196e29a49483c8e24e1e6aa9 (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S
index 1b5a0a09d609..c80fb49ce607 100644
--- a/arch/powerpc/lib/string.S
+++ b/arch/powerpc/lib/string.S
@@ -93,6 +93,7 @@ _GLOBAL(strlen)
 	subf	r3,r3,r4
 	blr
 
+#ifdef CONFIG_PPC32
 _GLOBAL(memcmp)
 	PPC_LCMPI 0,r5,0
 	beq-	2f
@@ -106,6 +107,7 @@ _GLOBAL(memcmp)
 	blr
 2:	li	r3,0
 	blr
+#endif
 
 _GLOBAL(memchr)
 	PPC_LCMPI 0,r5,0
author	Anton Blanchard <anton@samba.org>	2015-01-21 12:27:38 +1100
committer	Michael Ellerman <mpe@ellerman.id.au>	2015-01-23 14:02:55 +1100
commit	15c2d45d17418cc4a712608c78ff3b5f0583d83b (patch)
tree	53e4ee00f5e0b604ee7451ee6e229751043ae0f6 /arch/powerpc/lib/string.S
parent	a113de373bcb7651196e29a49483c8e24e1e6aa9 (diff)