absl/hash/internal/low_level_hash_test.cc · master · open / abseil-cpp

Optimize `absl::Hash` by making `LowLevelHash` faster. · 3e59efa2

Throughput of the 64 byte chunk loop inside `LowLevelHash` (or now in `LowLevelHashLenGt16`) gets limited by the loop carried dependency on `current_state`. By using 4 states instead of 2, we can reduce this duration by 1 cycle. On Skylake, it is reduced from 9 cycles to 8 cycles (12.5% faster asymptotically).

To see the reduction in a simplified version of `LowLevelHash` implementation on Skylake:
* Before: https://godbolt.org/z/Tcj9vsGax, llvm-mca (https://godbolt.org/z/3o78Msr63) shows 9 cycles / iteration.
* After: https://godbolt.org/z/q4GM4EjPr, llvm-mca (https://godbolt.org/z/W5d1KEMzq) shows 8 cycles / iteration.
* This CL is removing 1 xor (1 cycle) per iteration from the critical path.

A block for 32 byte chunk is also added.

Finally, just before returning, `Mix` is called 1 time instead of twice.

PiperOrigin-RevId: 605090653
Change-Id: Ib7517ebb8bef7484066cd14cf41a943953e93377

committed Feb 07, 2024

3e59efa2

low_level_hash_test.cc 26.3 KB

Edit