Optimize CRC32 for Ampere Siryn
Siryn's crc32 instruction seems to have latency 3 and throughput 1, which makes the optimal ratio of pmull and crc streams close to that of tested x86 machines. Up to +120% faster for large inputs. PiperOrigin-RevId: 568645559 Change-Id: I86b85b1b2a5d4fb3680c516c4c9044238b20fe61
Showing
Please
register
or
sign in
to comment