1. 15 Sep, 2023 2 commits
    • absl: optimize Condition checks in Mutex code · 9a592abd
      1. Remove special handling of Condition::kTrue.
      
      Condition::kTrue is used very rarely (frequently its uses even indicate
      confusion and bugs). But we pay few additional branches for kTrue
      on all Condition operations.
      Remove that special handling and simplify logic.
      
      2. And remove known_false condition in Mutex code.
      
      Checking known_false condition only causes slow down because:
      1. We already built skip list with equivalent conditions
      (and keep improving it on every Skip call). And when we built
      the skip list, we used more capable GuaranteedEqual function
      (it does not just check for equality of pointers,
      but for also for equality of function/arg).
      
      2. Condition pointer are rarely equal even for equivalent conditions
      becuase temp Condition objects are usually created on the stack.
      We could call GuaranteedEqual(cond, known_false) instead of cond == known_false,
      but that slows down things even more (see point 1).
      
      So remove the known_false optimization.
      Benchmark results for this and the previous change:
      
      name                        old cpu/op   new cpu/op   delta
      BM_ConditionWaiters/0/1     36.0ns ± 0%  34.9ns ± 0%   -3.02%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/1     36.0ns ± 0%  34.9ns ± 0%   -2.98%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/1     35.9ns ± 0%  34.9ns ± 0%   -3.03%  (p=0.016 n=5+4)
      BM_ConditionWaiters/0/8     55.5ns ± 5%  49.8ns ± 3%  -10.33%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/8     36.2ns ± 0%  35.2ns ± 0%   -2.90%  (p=0.016 n=5+4)
      BM_ConditionWaiters/2/8     53.2ns ± 7%  48.3ns ± 7%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/0/64     295ns ± 1%   254ns ± 2%  -13.73%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/64    36.2ns ± 0%  35.2ns ± 0%   -2.85%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/64     290ns ± 6%   250ns ± 4%  -13.68%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/512   5.50µs ±12%  4.99µs ± 8%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/1/512   36.7ns ± 3%  35.2ns ± 0%   -4.10%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/512   4.44µs ±13%  4.01µs ± 3%   -9.74%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/4096   104µs ± 6%   101µs ± 3%     ~     (p=0.548 n=5+5)
      BM_ConditionWaiters/1/4096  36.2ns ± 0%  35.1ns ± 0%   -3.03%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/4096  90.4µs ± 5%  85.3µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/0/8192   384µs ± 5%   367µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/1/8192  36.2ns ± 0%  35.2ns ± 0%   -2.84%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/8192   363µs ± 3%   316µs ± 7%  -12.84%  (p=0.008 n=5+5)
      
      PiperOrigin-RevId: 565669535
      Change-Id: I5180c4a787933d2ce477b004a111853753304684
      Dmitry Vyukov committed
    • Remove implicit int64_t->uint64_t conversion in ARM version of V128_Extract64 · c78a3f32
      PiperOrigin-RevId: 565662176
      Change-Id: I18d5d9eb444b0090e3f4ab8f66ad214a67344268
      Abseil Team committed
  2. 14 Sep, 2023 1 commit
  3. 13 Sep, 2023 2 commits
  4. 12 Sep, 2023 3 commits
  5. 11 Sep, 2023 2 commits
  6. 08 Sep, 2023 7 commits
    • Remove CordRepRing experiment. · efb035a5
      We have no intention to use it instead of the CordRepBtree implementation, so cleanup up and remove all code and references.
      
      PiperOrigin-RevId: 563803813
      Change-Id: I95a67318d0f722f3eb7ecdcc7b6c87e28f2e26dd
      Martijn Vels committed
    • Fix strict weak ordering in convert_test.cc · 09d29c58
      It sorts NaNs and the test became flaky. Flakiness arises from the fact that sorting checks randomize and check for 100 elements but we sort here around a thousand
      
      PiperOrigin-RevId: 563783036
      Change-Id: Id25bcb47483acf9c40be3fd1747c37d046197330
      Abseil Team committed
    • Rollback: · 792e55fc
      absl: remove special handling of Condition::kTrue
      absl: remove known_false condition in Mutex code
      There are some test breakages.
      
      PiperOrigin-RevId: 563751370
      Change-Id: Ie14dc799e0a0d286a7e1b47f0a9bbe59dfb23f70
      Abseil Team committed
    • absl: remove leftovers of CondVar support for other mutexes · 6644e5bb
      When CondVar accepted generic non-Mutex mutexes,
      Mutex pointer could be nullptr. Now that support is removed,
      but we still have some lingering checks for Mutex* == nullptr.
      Remove them.
      
      PiperOrigin-RevId: 563740239
      Change-Id: Ib744e0b991f411dd8dba1b0da6477c13832e0f65
      Abseil Team committed
    • absl: inline and de-dup Mutex::Await/LockWhen/CondVar::Wait · 1cf6469b
      Mutex::Await/LockWhen/CondVar::Wait duplicate code, and cause additional
      calls at runtime and code bloat.
      Inline thin wrappers that just convert argument types and
      add a single de-duped implementation for these methods.
      
      This reduces code size, shaves off 55K from the mutex_test in release build,
      and should make things marginally faster.
      
      $ nm -nS mutex_test | egrep "(_ZN4absl5Mutex.*(Await|LockWhen))|(_ZN4absl7CondVar.*Wait)"
      
      before:
      00000000000912c0 00000000000001a8 T _ZN4absl7CondVar4WaitEPNS_5MutexE
      00000000000988c0 0000000000000c36 T _ZN4absl7CondVar16WaitWithDeadlineEPNS_5MutexENS_4TimeE
      000000000009a6e0 0000000000000041 T _ZN4absl5Mutex19LockWhenWithTimeoutERKNS_9ConditionENS_8DurationE
      00000000000a28c0 0000000000000779 T _ZN4absl5Mutex17AwaitWithDeadlineERKNS_9ConditionENS_4TimeE
      00000000000cf4e0 0000000000000011 T _ZN4absl5Mutex8LockWhenERKNS_9ConditionE
      00000000000cf500 0000000000000041 T _ZN4absl5Mutex20LockWhenWithDeadlineERKNS_9ConditionENS_4TimeE
      00000000000cf560 0000000000000011 T _ZN4absl5Mutex14ReaderLockWhenERKNS_9ConditionE
      00000000000cf580 0000000000000041 T _ZN4absl5Mutex26ReaderLockWhenWithDeadlineERKNS_9ConditionENS_4TimeE
      00000000000cf5e0 0000000000000766 T _ZN4absl5Mutex5AwaitERKNS_9ConditionE
      00000000000cfd60 00000000000007b5 T _ZN4absl5Mutex16AwaitWithTimeoutERKNS_9ConditionENS_8DurationE
      00000000000d0700 00000000000003cf T _ZN4absl7CondVar15WaitWithTimeoutEPNS_5MutexENS_8DurationE
      000000000011c280 0000000000000041 T _ZN4absl5Mutex25ReaderLockWhenWithTimeoutERKNS_9ConditionENS_8DurationE
      
      after:
      000000000009c300 00000000000007ed T _ZN4absl7CondVar10WaitCommonEPNS_5MutexENS_24synchronization_internal13KernelTimeoutE
      00000000000a03c0 00000000000006fe T _ZN4absl5Mutex11AwaitCommonERKNS_9ConditionENS_24synchronization_internal13KernelTimeoutE
      000000000011ae00 0000000000000025 T _ZN4absl5Mutex14LockWhenCommonERKNS_9ConditionENS_24synchronization_internal13KernelTimeoutEb
      PiperOrigin-RevId: 563729364
      Change-Id: Ic6b43761f76719c01e03d43cc0e0c419e41a85c1
      Abseil Team committed
    • absl: remove known_false condition in Mutex code · b9980dd4
      Checking known_false condition only causes slow down because:
      1. We already built skip list with equivalent conditions
      (and keep improving it on every Skip call). And when we built
      the skip list, we used more capable GuaranteedEqual function
      (it does not just check for equality of pointers,
      but for also for equality of function/arg).
      
      2. Condition pointer are rarely equal even for equivalent conditions
      becuase temp Condition objects are usually created on the stack.
      We could call GuaranteedEqual(cond, known_false) instead of cond == known_false,
      but that slows down things even more (see point 1).
      
      So remove the known_false optimization.
      Benchmark results for this and the previous change:
      
      name                        old cpu/op   new cpu/op   delta
      BM_ConditionWaiters/0/1     36.0ns ± 0%  34.9ns ± 0%   -3.02%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/1     36.0ns ± 0%  34.9ns ± 0%   -2.98%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/1     35.9ns ± 0%  34.9ns ± 0%   -3.03%  (p=0.016 n=5+4)
      BM_ConditionWaiters/0/8     55.5ns ± 5%  49.8ns ± 3%  -10.33%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/8     36.2ns ± 0%  35.2ns ± 0%   -2.90%  (p=0.016 n=5+4)
      BM_ConditionWaiters/2/8     53.2ns ± 7%  48.3ns ± 7%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/0/64     295ns ± 1%   254ns ± 2%  -13.73%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/64    36.2ns ± 0%  35.2ns ± 0%   -2.85%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/64     290ns ± 6%   250ns ± 4%  -13.68%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/512   5.50µs ±12%  4.99µs ± 8%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/1/512   36.7ns ± 3%  35.2ns ± 0%   -4.10%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/512   4.44µs ±13%  4.01µs ± 3%   -9.74%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/4096   104µs ± 6%   101µs ± 3%     ~     (p=0.548 n=5+5)
      BM_ConditionWaiters/1/4096  36.2ns ± 0%  35.1ns ± 0%   -3.03%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/4096  90.4µs ± 5%  85.3µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/0/8192   384µs ± 5%   367µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/1/8192  36.2ns ± 0%  35.2ns ± 0%   -2.84%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/8192   363µs ± 3%   316µs ± 7%  -12.84%  (p=0.008 n=5+5)
      
      PiperOrigin-RevId: 563717887
      Change-Id: I9a62670628510d764a4f2f88a047abb8f85009e2
      Abseil Team committed
    • absl: remove special handling of Condition::kTrue · 38afe317
      Condition::kTrue is used very rarely (frequently its uses even indicate
      confusion and bugs). But we pay few additional branches for kTrue
      on all Condition operations.
      Remove that special handling and simplify logic.
      PiperOrigin-RevId: 563691160
      Change-Id: I76125adde4872489da069dd9c894ed73a65d1d83
      Abseil Team committed
  7. 07 Sep, 2023 4 commits
  8. 06 Sep, 2023 3 commits
  9. 05 Sep, 2023 4 commits
    • Remove the unused LowerBoundAllocatedByteSize function. · 415a1d1c
      PiperOrigin-RevId: 562832827
      Change-Id: If37f83e67b3b2ea350f74dd6bffae51ea5508f12
      Evan Brown committed
    • Invert the "is inlined" bit of absl::Status · 5c9f72fa
      This change makes  RepToPointer/PointerToRep have 0 instructions.
      This makes IsMovedFrom simpler (although this could always have left out the IsInlined check since that bit can never be set on the aligned pointer)
      
      In exchange, it makes CodeToInlinedRep slower, but does not inhibit replacing it with a constant.
      InlinedRepToCode is unaffected.
      
      PiperOrigin-RevId: 562826801
      Change-Id: I2732f04ab293b773edc2efdec546b3a287b980c2
      Abseil Team committed
    • Rollback adding support for ARM intrinsics · 461f1e49
      In some configurations this change causes compilation errors. We will roll this
      forward again after those issue are addressed.
      
      PiperOrigin-RevId: 562810916
      Change-Id: I45b2a8d456273e9eff188f36da8f11323c4dfe66
      Abseil Team committed
    • Add support for ARM intrinsics in crc_memcpy · 1a882833
      This change replaces inline x86 intrinsics with generic versions that compile
      for both x86 and ARM depending on the target arch.
      
      This change does not enable the accelerated crc memcpy engine on ARM. That will
      be done in a subsequent change after the optimal number of vector and integer
      regions for different CPUs is determined.
      
      PiperOrigin-RevId: 562785420
      Change-Id: I8ba4aa8de17587cedd92532f03767059a481f159
      Abseil Team committed
  10. 01 Sep, 2023 1 commit
  11. 31 Aug, 2023 3 commits
  12. 30 Aug, 2023 5 commits
    • Remove unused ReservedFlag. · a86bb8a9
      PiperOrigin-RevId: 561444343
      Change-Id: I26c648b28b626e11caa32b0a34aef92932d5ddb9
      Tomas Dzetkulic committed
    • Add CPU detection for Ampere Siryn · c99fbc0a
      PiperOrigin-RevId: 561444259
      Change-Id: I205ba9f11f4d41163ce74ae9cfa417fe500ccab3
      Abseil Team committed
    • Optimize Resize and Iteration on Arm · 37770938
      There is a few cycles of overhead when transfering between GPR and Neon registers. We pay this cost for GroupAarch64Impl, largely because the speedup we get in Match() makes it profitable. After a Match call, if we do subsequent Group operations, we don't have to pay the full GPR <-> Neon cost, so it makes sense to do them with Neon instructions as well.
      
      However, in iteration and find_first_non_full(), we do not do a prior Match(), so the Mask/Count EmptyOrDeleted calls pay the full GPR <-> Neon cost. We can avoid this by using the GPR versions of the functions in the portable implementation of Group instead. We slightly change the order of operations in these functions (should be functionally a nop) in order to take advantage of Arm's free flexible second operand shifts with Logical operations.
      
      Iteration and Resize are roughly 8% and 12.6% faster respectively.
      
      This is not profitable on x86 because there is much lower GPR <-> xmm register latency and we use a 16-bit wide Group size.
      
      PiperOrigin-RevId: 561415183
      Change-Id: I660b5bb84afedb05a12dcdf04d5b2e1514902760
      Connal de Souza committed
    • Add missing #include options.h in optimization.h. · 99a3a6ae
      options.h was already included indirectly from config.h. This CL is just to include what you use.
      
      PiperOrigin-RevId: 561376910
      Change-Id: I5b96b2aedc1e02eddc049f5bf0e6faa91799930d
      Abseil Team committed
    • absl: fix a priority bug in CondVar wait morphing · b06ab1f3
      Enqueue updates priority of the queued thread.
      It was assumed that the queued thread is the current thread.
      But it's not the case in CondVar wait morhping,
      where we requeue an existing CondVar waiter on the Mutex.
      As the result one thread can falsely get priority of another thread.
      
      Fix this by not updating priority in this case.
      And make the assumption explicit and checked.
      
      PiperOrigin-RevId: 561249402
      Change-Id: I9476c047757090b893a88a2839b795b85fe220ad
      Abseil Team committed
  13. 29 Aug, 2023 3 commits