Throughput of the 64 byte chunk loop inside `LowLevelHash` (or now in `LowLevelHashLenGt16`) gets limited by the loop carried dependency on `current_state`. By using 4 states instead of 2, we can reduce this duration by 1 cycle. On Skylake, it is reduced from 9 cycles to 8 cycles (12.5% faster asymptotically). To see the reduction in a simplified version of `LowLevelHash` implementation on Skylake: * Before: https://godbolt.org/z/Tcj9vsGax, llvm-mca (https://godbolt.org/z/3o78Msr63) shows 9 cycles / iteration. * After: https://godbolt.org/z/q4GM4EjPr, llvm-mca (https://godbolt.org/z/W5d1KEMzq) shows 8 cycles / iteration. * This CL is removing 1 xor (1 cycle) per iteration from the critical path. A block for 32 byte chunk is also added. Finally, just before returning, `Mix` is called 1 time instead of twice. PiperOrigin-RevId: 605090653 Change-Id: Ib7517ebb8bef7484066cd14cf41a943953e93377
| Name |
Last commit
|
Last Update |
|---|---|---|
| .github | Loading commit data... | |
| CMake | Loading commit data... | |
| absl | Loading commit data... | |
| ci | Loading commit data... | |
| .clang-format | Loading commit data... | |
| .gitignore | Loading commit data... | |
| ABSEIL_ISSUE_TEMPLATE.md | Loading commit data... | |
| AUTHORS | Loading commit data... | |
| BUILD.bazel | Loading commit data... | |
| CMakeLists.txt | Loading commit data... | |
| CONTRIBUTING.md | Loading commit data... | |
| FAQ.md | Loading commit data... | |
| LICENSE | Loading commit data... | |
| MODULE.bazel | Loading commit data... | |
| PrivacyInfo.xcprivacy | Loading commit data... | |
| README.md | Loading commit data... | |
| UPGRADES.md | Loading commit data... | |
| WORKSPACE | Loading commit data... | |
| WORKSPACE.bzlmod | Loading commit data... | |
| conanfile.py | Loading commit data... | |
| create_lts.py | Loading commit data... |