Commit 03786143 by Dmitry Vyukov Committed by Copybara-Service

Optimize prefetch codegen.

Currently we use "r" constraint to pass prefetched address.
This forces the compiler to actually put it into a register.
As the result some uses look as:

 16bfb7c:	48 01 cf             	add    %rcx,%rdi
 16bfb7f:	0f 0d 0f             	prefetchw (%rdi)
--
 16bfccf:	48 83 c1 60          	add    $0x60,%rcx
 16bfcd3:	0f 0d 09             	prefetchw (%rcx)

Use "m" constraint instead. It's more relaxed and requires
to just materialize the address in some form using whatever
addressing modes the target supports (e.g. x86 off(base, index, scale)).
With the change the same code becomes:

 16bfb7c:	0f 0d 0c 39          	prefetchw (%rcx,%rdi,1)
--
 16bfccf:	0f 0d 49 60          	prefetchw 0x60(%rcx)

PiperOrigin-RevId: 574723975
Change-Id: Id0c8645f8c702d1842685343901da321f6513156
parent 9687a8ea
......@@ -158,7 +158,7 @@ ABSL_ATTRIBUTE_ALWAYS_INLINE inline void PrefetchToLocalCacheForWrite(
// manually emit prefetchw. PREFETCHW is recognized as a no-op on older Intel
// processors and has been present on AMD processors since the K6-2.
#if defined(__x86_64__)
asm("prefetchw (%0)" : : "r"(addr));
asm("prefetchw %0" : : "m"(*reinterpret_cast<const char*>(addr)));
#else
__builtin_prefetch(addr, 1, 3);
#endif
......@@ -187,7 +187,7 @@ ABSL_ATTRIBUTE_ALWAYS_INLINE inline void PrefetchToLocalCacheForWrite(
// up, PREFETCHW is recognized as a no-op on older Intel processors
// and has been present on AMD processors since the K6-2. We have this
// disabled for MSVC compilers as this miscompiles on older MSVC compilers.
asm("prefetchw (%0)" : : "r"(addr));
asm("prefetchw %0" : : "m"(*reinterpret_cast<const char*>(addr)));
#endif
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment