Commit ac364eb9 by Connal de Souza Committed by Copybara-Service

Optimize CRC32 for Ampere Siryn

Siryn's crc32 instruction seems to have latency 3 and throughput 1, which makes the optimal ratio of pmull and crc streams close to that of tested x86 machines. Up to +120% faster for large inputs.

PiperOrigin-RevId: 568645559
Change-Id: I86b85b1b2a5d4fb3680c516c4c9044238b20fe61
parent 2fa24cc4
...@@ -636,6 +636,9 @@ CRCImpl* TryNewCRC32AcceleratedX86ARMCombined() { ...@@ -636,6 +636,9 @@ CRCImpl* TryNewCRC32AcceleratedX86ARMCombined() {
case CpuType::kArmNeoverseN1: case CpuType::kArmNeoverseN1:
return new CRC32AcceleratedX86ARMCombinedMultipleStreams< return new CRC32AcceleratedX86ARMCombinedMultipleStreams<
1, 1, CutoffStrategy::Unroll64CRC>(); 1, 1, CutoffStrategy::Unroll64CRC>();
case CpuType::kAmpereSiryn:
return new CRC32AcceleratedX86ARMCombinedMultipleStreams<
3, 2, CutoffStrategy::Fold3>();
#if defined(__aarch64__) #if defined(__aarch64__)
default: default:
// Not all ARM processors support the needed instructions, so check here // Not all ARM processors support the needed instructions, so check here
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment