1. 06 Jun, 2022 7 commits
  2. 03 Jun, 2022 1 commit
  3. 02 Jun, 2022 5 commits
  4. 31 May, 2022 9 commits
  5. 27 May, 2022 3 commits
  6. 26 May, 2022 3 commits
    • Enable __thread on Asylo · 89cdaed6
      PiperOrigin-RevId: 451201387
      Change-Id: Ibeac4f24d00e28bbfc61e476936d669321a2cb24
      Abseil Team committed
    • Add implementation of is_invocable_r to absl::base_internal for C++ < 17, define… · 0bc4bc23
      Add implementation of is_invocable_r to absl::base_internal for C++ < 17, define it as alias of std::is_invocable_r when C++ >= 17
      
      PiperOrigin-RevId: 451171660
      Change-Id: I6dc0e40eabac72b82c4a19e292158e43118cb080
      Dino Radakovic committed
    • Optimize SwissMap iteration for aarch64 by 5-6% · 591a2cda
      Benchmarks: https://pastebin.com/tZ7dr67W. Works well especially on smaller ranges.
      
      After a week on spending optimizing NEON SIMD where I almost managed to make hash tables work with NEON SIMD without performance hits (still 1 cycle to optimize and I gave up a little), I found an interesting optimization for aarch64 to use cls instruction (count leading sign bits).
      
      The loop has a property that ctrl_ group is not matched against count when the first slot is empty or deleted.
      
      ```
      void skip_empty_or_deleted() {
            while (IsEmptyOrDeleted(*ctrl_)) {
              uint32_t shift = Group{ctrl_}.CountLeadingEmptyOrDeleted();
              ctrl_ += shift;
              slot_ += shift;
            }
            ...
      }
      ```
      
      However, `kEmpty` and `kDeleted` have format of `1xxxxxx0` and `~ctrl & (ctrl >> 7)` always sets the lowest bit to 1.
      
      In naive implementation, it does +1 to start counting zero bits, however, in aarch64 we may start counting one bits immediately. This saves 1 cycle and 5% of iteration performance.
      
      Then it becomes hard to find a supported and sustainable C++ version of it.
      
      `__clsll` is not supported by GCC and was supported only since clang 8, `__builtin_clrsb` is not producing optimal codegen for clang. `__rbit` is not supported by GCC and there is no intrinsic to do that, however, in clang we have `__builtin_bitreverse{32,64}`. For now I decided to enable this only for clang, only if they have appropriate builtins.
      
      PiperOrigin-RevId: 451168570
      Change-Id: I7e9256a60aecdc88ced4e6eb15ebc257281b6664
      Abseil Team committed
  7. 25 May, 2022 1 commit
  8. 24 May, 2022 1 commit
  9. 23 May, 2022 2 commits
  10. 20 May, 2022 3 commits
  11. 18 May, 2022 3 commits
  12. 17 May, 2022 2 commits