crc_x86_arm_combined.cc
29 KB
-
Optimize CRC32 for Ampere Siryn · ac364eb9
Siryn's crc32 instruction seems to have latency 3 and throughput 1, which makes the optimal ratio of pmull and crc streams close to that of tested x86 machines. Up to +120% faster for large inputs. PiperOrigin-RevId: 568645559 Change-Id: I86b85b1b2a5d4fb3680c516c4c9044238b20fe61
Connal de Souza committed