1. 19 Sep, 2023 4 commits
    • Refactor for preliminary API update. · d91f39ab
      PiperOrigin-RevId: 566675048
      Change-Id: Ie598c21474858974e4b4adbad401c61a38924c98
      Abseil Team committed
    • Additional StrCat microbenchmarks. · bd467aad
      PiperOrigin-RevId: 566650311
      Change-Id: Ibfabee88ea9999d08ade05ece362f5a075d19695
      Abseil Team committed
    • absl: speed up Mutex::Lock · cffc9ef2
      Currently Mutex::Lock contains not inlined non-tail call:
      TryAcquireWithSpinning -> GetMutexGlobals -> LowLevelCallOnce -> init closure
      This turns the function into non-leaf with stack frame allocation
      and additional register use. Remove this non-tail call to make the function leaf.
      Move spin iterations initialization to LockSlow.
      
      Current Lock happy path:
      
      00000000001edc20 <absl::Mutex::Lock()>:
        1edc20:	55                   	push   %rbp
        1edc21:	48 89 e5             	mov    %rsp,%rbp
        1edc24:	53                   	push   %rbx
        1edc25:	50                   	push   %rax
        1edc26:	48 89 fb             	mov    %rdi,%rbx
        1edc29:	48 8b 07             	mov    (%rdi),%rax
        1edc2c:	a8 19                	test   $0x19,%al
        1edc2e:	75 0e                	jne    1edc3e <absl::Mutex::Lock()+0x1e>
        1edc30:	48 89 c1             	mov    %rax,%rcx
        1edc33:	48 83 c9 08          	or     $0x8,%rcx
        1edc37:	f0 48 0f b1 0b       	lock cmpxchg %rcx,(%rbx)
        1edc3c:	74 42                	je     1edc80 <absl::Mutex::Lock()+0x60>
        ... unhappy path ...
        1edc80:	48 83 c4 08          	add    $0x8,%rsp
        1edc84:	5b                   	pop    %rbx
        1edc85:	5d                   	pop    %rbp
        1edc86:	c3                   	ret
      
      New Lock happy path:
      
      00000000001eea80 <absl::Mutex::Lock()>:
        1eea80:	48 8b 07             	mov    (%rdi),%rax
        1eea83:	a8 19                	test   $0x19,%al
        1eea85:	75 0f                	jne    1eea96 <absl::Mutex::Lock()+0x16>
        1eea87:	48 89 c1             	mov    %rax,%rcx
        1eea8a:	48 83 c9 08          	or     $0x8,%rcx
        1eea8e:	f0 48 0f b1 0f       	lock cmpxchg %rcx,(%rdi)
        1eea93:	75 01                	jne    1eea96 <absl::Mutex::Lock()+0x16>
        1eea95:	c3                   	ret
        ... unhappy path ...
      
      PiperOrigin-RevId: 566488042
      Change-Id: I62f854b82a322cfb1d42c34f8ed01b4677693fca
      Dmitry Vyukov committed
    • absl: requeue waiters as LIFO · a5dc018f
      Currently if a thread already blocked on a Mutex,
      but then failed to acquire the Mutex, we queue it in FIFO order again.
      As the result unlucky threads can suffer bad latency
      if they are requeued several times.
      The least we can do for them is to queue in LIFO order after blocking.
      
      PiperOrigin-RevId: 566478783
      Change-Id: I8bac08325f20ff6ccc2658e04e1847fd4614c653
      Dmitry Vyukov committed
  2. 18 Sep, 2023 1 commit
  3. 15 Sep, 2023 7 commits
    • absl: remove special case for timed CondVar waits · 2c1e7e3c
      CondVar wait morhping has a special case for timed waits.
      The code goes back to 2006, it seems that there might have
      been some reasons to do this back then.
      But now it does not seem to be necessary.
      Wait morphing should work just fine after timed CondVar waits.
      Remove the special case and simplify code.
      
      PiperOrigin-RevId: 565798838
      Change-Id: I4e4d61ae7ebd521f5c32dfc673e57a0c245e7cfb
      Dmitry Vyukov committed
    • Honor ABSL_MIN_LOG_LEVEL in CHECK_XX, CHECK_STRXX, CHECK_OK, and the QCHECK flavors of these. · 9356553a
      In particular, if ABSL_MIN_LOG_LEVEL exceeds kFatal, these should, upon failure, terminate the program without logging anything.  The lack of logging should be visible to the optimizer so that it can strip string literals and stringified variable names from the object file.
      
      Making some edge cases work under Clang required rewriting NormalizeLogSeverity to help make constraints on its return value more obvious to the optimizer.
      
      PiperOrigin-RevId: 565792699
      Change-Id: Ibb6a47d4956191bbbd0297e04492cddc354578e2
      Andy Getzendanner committed
    • Fix a bug in which we used propagate_on_container_copy_assignment in btree move assignment. · f44e2cac
      PiperOrigin-RevId: 565730754
      Change-Id: Id828847d32c812736669803c179351433dda4aa6
      Evan Brown committed
    • Move CountingAllocator into test_allocator.h and add some other allocators that… · 49be2e68
      Move CountingAllocator into test_allocator.h and add some other allocators that can be shared between different container tests.
      
      PiperOrigin-RevId: 565693736
      Change-Id: I59af987e30da03a805ce59ff0fb7eeae3fc08293
      Evan Brown committed
    • Allow const qualified FunctionRef instances. This allows the signature to be… · e68f1412
      Allow const qualified FunctionRef instances. This allows the signature to be compatible with AnyInvokable for const uses.
      
      PiperOrigin-RevId: 565682320
      Change-Id: I924dadf110481e572bdb8af0111fa62d6f553d90
      Abseil Team committed
    • absl: optimize Condition checks in Mutex code · 9a592abd
      1. Remove special handling of Condition::kTrue.
      
      Condition::kTrue is used very rarely (frequently its uses even indicate
      confusion and bugs). But we pay few additional branches for kTrue
      on all Condition operations.
      Remove that special handling and simplify logic.
      
      2. And remove known_false condition in Mutex code.
      
      Checking known_false condition only causes slow down because:
      1. We already built skip list with equivalent conditions
      (and keep improving it on every Skip call). And when we built
      the skip list, we used more capable GuaranteedEqual function
      (it does not just check for equality of pointers,
      but for also for equality of function/arg).
      
      2. Condition pointer are rarely equal even for equivalent conditions
      becuase temp Condition objects are usually created on the stack.
      We could call GuaranteedEqual(cond, known_false) instead of cond == known_false,
      but that slows down things even more (see point 1).
      
      So remove the known_false optimization.
      Benchmark results for this and the previous change:
      
      name                        old cpu/op   new cpu/op   delta
      BM_ConditionWaiters/0/1     36.0ns ± 0%  34.9ns ± 0%   -3.02%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/1     36.0ns ± 0%  34.9ns ± 0%   -2.98%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/1     35.9ns ± 0%  34.9ns ± 0%   -3.03%  (p=0.016 n=5+4)
      BM_ConditionWaiters/0/8     55.5ns ± 5%  49.8ns ± 3%  -10.33%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/8     36.2ns ± 0%  35.2ns ± 0%   -2.90%  (p=0.016 n=5+4)
      BM_ConditionWaiters/2/8     53.2ns ± 7%  48.3ns ± 7%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/0/64     295ns ± 1%   254ns ± 2%  -13.73%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/64    36.2ns ± 0%  35.2ns ± 0%   -2.85%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/64     290ns ± 6%   250ns ± 4%  -13.68%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/512   5.50µs ±12%  4.99µs ± 8%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/1/512   36.7ns ± 3%  35.2ns ± 0%   -4.10%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/512   4.44µs ±13%  4.01µs ± 3%   -9.74%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/4096   104µs ± 6%   101µs ± 3%     ~     (p=0.548 n=5+5)
      BM_ConditionWaiters/1/4096  36.2ns ± 0%  35.1ns ± 0%   -3.03%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/4096  90.4µs ± 5%  85.3µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/0/8192   384µs ± 5%   367µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/1/8192  36.2ns ± 0%  35.2ns ± 0%   -2.84%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/8192   363µs ± 3%   316µs ± 7%  -12.84%  (p=0.008 n=5+5)
      
      PiperOrigin-RevId: 565669535
      Change-Id: I5180c4a787933d2ce477b004a111853753304684
      Dmitry Vyukov committed
    • Remove implicit int64_t->uint64_t conversion in ARM version of V128_Extract64 · c78a3f32
      PiperOrigin-RevId: 565662176
      Change-Id: I18d5d9eb444b0090e3f4ab8f66ad214a67344268
      Abseil Team committed
  4. 14 Sep, 2023 1 commit
  5. 13 Sep, 2023 2 commits
  6. 12 Sep, 2023 3 commits
  7. 11 Sep, 2023 2 commits
  8. 08 Sep, 2023 7 commits
    • Remove CordRepRing experiment. · efb035a5
      We have no intention to use it instead of the CordRepBtree implementation, so cleanup up and remove all code and references.
      
      PiperOrigin-RevId: 563803813
      Change-Id: I95a67318d0f722f3eb7ecdcc7b6c87e28f2e26dd
      Martijn Vels committed
    • Fix strict weak ordering in convert_test.cc · 09d29c58
      It sorts NaNs and the test became flaky. Flakiness arises from the fact that sorting checks randomize and check for 100 elements but we sort here around a thousand
      
      PiperOrigin-RevId: 563783036
      Change-Id: Id25bcb47483acf9c40be3fd1747c37d046197330
      Abseil Team committed
    • Rollback: · 792e55fc
      absl: remove special handling of Condition::kTrue
      absl: remove known_false condition in Mutex code
      There are some test breakages.
      
      PiperOrigin-RevId: 563751370
      Change-Id: Ie14dc799e0a0d286a7e1b47f0a9bbe59dfb23f70
      Abseil Team committed
    • absl: remove leftovers of CondVar support for other mutexes · 6644e5bb
      When CondVar accepted generic non-Mutex mutexes,
      Mutex pointer could be nullptr. Now that support is removed,
      but we still have some lingering checks for Mutex* == nullptr.
      Remove them.
      
      PiperOrigin-RevId: 563740239
      Change-Id: Ib744e0b991f411dd8dba1b0da6477c13832e0f65
      Abseil Team committed
    • absl: inline and de-dup Mutex::Await/LockWhen/CondVar::Wait · 1cf6469b
      Mutex::Await/LockWhen/CondVar::Wait duplicate code, and cause additional
      calls at runtime and code bloat.
      Inline thin wrappers that just convert argument types and
      add a single de-duped implementation for these methods.
      
      This reduces code size, shaves off 55K from the mutex_test in release build,
      and should make things marginally faster.
      
      $ nm -nS mutex_test | egrep "(_ZN4absl5Mutex.*(Await|LockWhen))|(_ZN4absl7CondVar.*Wait)"
      
      before:
      00000000000912c0 00000000000001a8 T _ZN4absl7CondVar4WaitEPNS_5MutexE
      00000000000988c0 0000000000000c36 T _ZN4absl7CondVar16WaitWithDeadlineEPNS_5MutexENS_4TimeE
      000000000009a6e0 0000000000000041 T _ZN4absl5Mutex19LockWhenWithTimeoutERKNS_9ConditionENS_8DurationE
      00000000000a28c0 0000000000000779 T _ZN4absl5Mutex17AwaitWithDeadlineERKNS_9ConditionENS_4TimeE
      00000000000cf4e0 0000000000000011 T _ZN4absl5Mutex8LockWhenERKNS_9ConditionE
      00000000000cf500 0000000000000041 T _ZN4absl5Mutex20LockWhenWithDeadlineERKNS_9ConditionENS_4TimeE
      00000000000cf560 0000000000000011 T _ZN4absl5Mutex14ReaderLockWhenERKNS_9ConditionE
      00000000000cf580 0000000000000041 T _ZN4absl5Mutex26ReaderLockWhenWithDeadlineERKNS_9ConditionENS_4TimeE
      00000000000cf5e0 0000000000000766 T _ZN4absl5Mutex5AwaitERKNS_9ConditionE
      00000000000cfd60 00000000000007b5 T _ZN4absl5Mutex16AwaitWithTimeoutERKNS_9ConditionENS_8DurationE
      00000000000d0700 00000000000003cf T _ZN4absl7CondVar15WaitWithTimeoutEPNS_5MutexENS_8DurationE
      000000000011c280 0000000000000041 T _ZN4absl5Mutex25ReaderLockWhenWithTimeoutERKNS_9ConditionENS_8DurationE
      
      after:
      000000000009c300 00000000000007ed T _ZN4absl7CondVar10WaitCommonEPNS_5MutexENS_24synchronization_internal13KernelTimeoutE
      00000000000a03c0 00000000000006fe T _ZN4absl5Mutex11AwaitCommonERKNS_9ConditionENS_24synchronization_internal13KernelTimeoutE
      000000000011ae00 0000000000000025 T _ZN4absl5Mutex14LockWhenCommonERKNS_9ConditionENS_24synchronization_internal13KernelTimeoutEb
      PiperOrigin-RevId: 563729364
      Change-Id: Ic6b43761f76719c01e03d43cc0e0c419e41a85c1
      Abseil Team committed
    • absl: remove known_false condition in Mutex code · b9980dd4
      Checking known_false condition only causes slow down because:
      1. We already built skip list with equivalent conditions
      (and keep improving it on every Skip call). And when we built
      the skip list, we used more capable GuaranteedEqual function
      (it does not just check for equality of pointers,
      but for also for equality of function/arg).
      
      2. Condition pointer are rarely equal even for equivalent conditions
      becuase temp Condition objects are usually created on the stack.
      We could call GuaranteedEqual(cond, known_false) instead of cond == known_false,
      but that slows down things even more (see point 1).
      
      So remove the known_false optimization.
      Benchmark results for this and the previous change:
      
      name                        old cpu/op   new cpu/op   delta
      BM_ConditionWaiters/0/1     36.0ns ± 0%  34.9ns ± 0%   -3.02%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/1     36.0ns ± 0%  34.9ns ± 0%   -2.98%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/1     35.9ns ± 0%  34.9ns ± 0%   -3.03%  (p=0.016 n=5+4)
      BM_ConditionWaiters/0/8     55.5ns ± 5%  49.8ns ± 3%  -10.33%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/8     36.2ns ± 0%  35.2ns ± 0%   -2.90%  (p=0.016 n=5+4)
      BM_ConditionWaiters/2/8     53.2ns ± 7%  48.3ns ± 7%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/0/64     295ns ± 1%   254ns ± 2%  -13.73%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/64    36.2ns ± 0%  35.2ns ± 0%   -2.85%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/64     290ns ± 6%   250ns ± 4%  -13.68%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/512   5.50µs ±12%  4.99µs ± 8%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/1/512   36.7ns ± 3%  35.2ns ± 0%   -4.10%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/512   4.44µs ±13%  4.01µs ± 3%   -9.74%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/4096   104µs ± 6%   101µs ± 3%     ~     (p=0.548 n=5+5)
      BM_ConditionWaiters/1/4096  36.2ns ± 0%  35.1ns ± 0%   -3.03%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/4096  90.4µs ± 5%  85.3µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/0/8192   384µs ± 5%   367µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/1/8192  36.2ns ± 0%  35.2ns ± 0%   -2.84%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/8192   363µs ± 3%   316µs ± 7%  -12.84%  (p=0.008 n=5+5)
      
      PiperOrigin-RevId: 563717887
      Change-Id: I9a62670628510d764a4f2f88a047abb8f85009e2
      Abseil Team committed
    • absl: remove special handling of Condition::kTrue · 38afe317
      Condition::kTrue is used very rarely (frequently its uses even indicate
      confusion and bugs). But we pay few additional branches for kTrue
      on all Condition operations.
      Remove that special handling and simplify logic.
      PiperOrigin-RevId: 563691160
      Change-Id: I76125adde4872489da069dd9c894ed73a65d1d83
      Abseil Team committed
  9. 07 Sep, 2023 4 commits
  10. 06 Sep, 2023 3 commits
  11. 05 Sep, 2023 4 commits
    • Remove the unused LowerBoundAllocatedByteSize function. · 415a1d1c
      PiperOrigin-RevId: 562832827
      Change-Id: If37f83e67b3b2ea350f74dd6bffae51ea5508f12
      Evan Brown committed
    • Invert the "is inlined" bit of absl::Status · 5c9f72fa
      This change makes  RepToPointer/PointerToRep have 0 instructions.
      This makes IsMovedFrom simpler (although this could always have left out the IsInlined check since that bit can never be set on the aligned pointer)
      
      In exchange, it makes CodeToInlinedRep slower, but does not inhibit replacing it with a constant.
      InlinedRepToCode is unaffected.
      
      PiperOrigin-RevId: 562826801
      Change-Id: I2732f04ab293b773edc2efdec546b3a287b980c2
      Abseil Team committed
    • Rollback adding support for ARM intrinsics · 461f1e49
      In some configurations this change causes compilation errors. We will roll this
      forward again after those issue are addressed.
      
      PiperOrigin-RevId: 562810916
      Change-Id: I45b2a8d456273e9eff188f36da8f11323c4dfe66
      Abseil Team committed
    • Add support for ARM intrinsics in crc_memcpy · 1a882833
      This change replaces inline x86 intrinsics with generic versions that compile
      for both x86 and ARM depending on the target arch.
      
      This change does not enable the accelerated crc memcpy engine on ARM. That will
      be done in a subsequent change after the optimal number of vector and integer
      regions for different CPUs is determined.
      
      PiperOrigin-RevId: 562785420
      Change-Id: I8ba4aa8de17587cedd92532f03767059a481f159
      Abseil Team committed
  12. 01 Sep, 2023 1 commit
  13. 31 Aug, 2023 1 commit