1. 26 Sep, 2023 5 commits
  2. 23 Sep, 2023 1 commit
  3. 22 Sep, 2023 1 commit
  4. 21 Sep, 2023 4 commits
    • Mutex: Rollback requeing waiters as LIFO · 90e8f6f7
      PiperOrigin-RevId: 567415671
      Change-Id: I59bfcb5ac9fbde227a4cdb3b497b0bd5969b0770
      Abseil Team committed
    • Optimize CRC32 Extend for large inputs on Arm · aa3c949a
      This is a temporary workaround for an apparent compiler bug with pmull(2) instructions. The current hot loop looks like this:
      
      mov	w14, #0xef02,
      lsl	x15, x15, #6,
      mov	x13, xzr,
      movk	w14, #0x740e, lsl #16,
      sub	x15, x15, #0x40,
      ldr	q4, [x16, #0x4e0],
      
      _LOOP_START:
      add	x16, x9, x13,
      add	x17, x12, x13,
      fmov	d19, x14,            <--------- This is Loop invariant and expensive
      add	x13, x13, #0x40,
      cmp	x15, x13,
      prfm	pldl1keep, [x16, #0x140],
      prfm	pldl1keep, [x17, #0x140],
      ldp	x18, x0, [x16, #0x40],
      crc32cx	w10, w10, x18,
      ldp	x2, x18, [x16, #0x50],
      crc32cx	w10, w10, x0,
      crc32cx	w10, w10, x2,
      ldp	x0, x2, [x16, #0x60],
      crc32cx	w10, w10, x18,
      ldp	x18, x16, [x16, #0x70],
      pmull2	v5.1q, v1.2d, v4.2d,
      pmull2	v6.1q, v0.2d, v4.2d,
      pmull2	v7.1q, v2.2d, v4.2d,
      pmull2	v16.1q, v3.2d, v4.2d,
      ldp	q17, q18, [x17, #0x40],
      crc32cx	w10, w10, x0,
      pmull	v1.1q, v1.1d, v19.1d,
      crc32cx	w10, w10, x2,
      pmull	v0.1q, v0.1d, v19.1d,
      crc32cx	w10, w10, x18,
      pmull	v2.1q, v2.1d, v19.1d,
      crc32cx	w10, w10, x16,
      pmull	v3.1q, v3.1d, v19.1d,
      ldp	q20, q21, [x17, #0x60],
      eor	v1.16b, v17.16b, v1.16b,
      eor	v0.16b, v18.16b, v0.16b,
      eor	v1.16b, v1.16b, v5.16b,
      eor	v2.16b, v20.16b, v2.16b,
      eor	v0.16b, v0.16b, v6.16b,
      eor	v3.16b, v21.16b, v3.16b,
      eor	v2.16b, v2.16b, v7.16b,
      eor	v3.16b, v3.16b, v16.16b,
      b.ne	_LOOP_START
      
      There is a redundant fmov that moves the same constant into a Neon register every loop iteration to be used in the PMULL instructions. The PMULL2 instructions already have this constant loaded into Neon registers. After this change, both the PMULL and PMULL2 instructions use the values in q4, and they are not reloaded every iteration. This fmov was expensive because it contends for execution units with crc32cx instructions. This is up to 20% faster for large inputs.
      
      PiperOrigin-RevId: 567391972
      Change-Id: I4c8e49750cfa5cc5730c3bb713bd9fd67657804a
      Connal de Souza committed
    • Replace BtreeAllocatorTest with individual test cases for copy/move/swap… · 821756c3
      Replace BtreeAllocatorTest with individual test cases for copy/move/swap propagation (defined in test_allocator.h) and minimal alignment.
      
      Also remove some extraneous value_types from typed tests. The motivation is to reduce btree_test compile time.
      
      PiperOrigin-RevId: 567376572
      Change-Id: I6ac6130b99faeadaedab8c2c7b05d5e23e77cc1e
      Evan Brown committed
    • Rollback "absl: speed up Mutex::Lock" · e313f0ed
      There are some regressions reported.
      
      PiperOrigin-RevId: 567181925
      Change-Id: I4ee8a61afd336de7ecb22ec307adb2068932bc8b
      Dmitry Vyukov committed
  5. 20 Sep, 2023 6 commits
  6. 19 Sep, 2023 4 commits
    • Refactor for preliminary API update. · d91f39ab
      PiperOrigin-RevId: 566675048
      Change-Id: Ie598c21474858974e4b4adbad401c61a38924c98
      Abseil Team committed
    • Additional StrCat microbenchmarks. · bd467aad
      PiperOrigin-RevId: 566650311
      Change-Id: Ibfabee88ea9999d08ade05ece362f5a075d19695
      Abseil Team committed
    • absl: speed up Mutex::Lock · cffc9ef2
      Currently Mutex::Lock contains not inlined non-tail call:
      TryAcquireWithSpinning -> GetMutexGlobals -> LowLevelCallOnce -> init closure
      This turns the function into non-leaf with stack frame allocation
      and additional register use. Remove this non-tail call to make the function leaf.
      Move spin iterations initialization to LockSlow.
      
      Current Lock happy path:
      
      00000000001edc20 <absl::Mutex::Lock()>:
        1edc20:	55                   	push   %rbp
        1edc21:	48 89 e5             	mov    %rsp,%rbp
        1edc24:	53                   	push   %rbx
        1edc25:	50                   	push   %rax
        1edc26:	48 89 fb             	mov    %rdi,%rbx
        1edc29:	48 8b 07             	mov    (%rdi),%rax
        1edc2c:	a8 19                	test   $0x19,%al
        1edc2e:	75 0e                	jne    1edc3e <absl::Mutex::Lock()+0x1e>
        1edc30:	48 89 c1             	mov    %rax,%rcx
        1edc33:	48 83 c9 08          	or     $0x8,%rcx
        1edc37:	f0 48 0f b1 0b       	lock cmpxchg %rcx,(%rbx)
        1edc3c:	74 42                	je     1edc80 <absl::Mutex::Lock()+0x60>
        ... unhappy path ...
        1edc80:	48 83 c4 08          	add    $0x8,%rsp
        1edc84:	5b                   	pop    %rbx
        1edc85:	5d                   	pop    %rbp
        1edc86:	c3                   	ret
      
      New Lock happy path:
      
      00000000001eea80 <absl::Mutex::Lock()>:
        1eea80:	48 8b 07             	mov    (%rdi),%rax
        1eea83:	a8 19                	test   $0x19,%al
        1eea85:	75 0f                	jne    1eea96 <absl::Mutex::Lock()+0x16>
        1eea87:	48 89 c1             	mov    %rax,%rcx
        1eea8a:	48 83 c9 08          	or     $0x8,%rcx
        1eea8e:	f0 48 0f b1 0f       	lock cmpxchg %rcx,(%rdi)
        1eea93:	75 01                	jne    1eea96 <absl::Mutex::Lock()+0x16>
        1eea95:	c3                   	ret
        ... unhappy path ...
      
      PiperOrigin-RevId: 566488042
      Change-Id: I62f854b82a322cfb1d42c34f8ed01b4677693fca
      Dmitry Vyukov committed
    • absl: requeue waiters as LIFO · a5dc018f
      Currently if a thread already blocked on a Mutex,
      but then failed to acquire the Mutex, we queue it in FIFO order again.
      As the result unlucky threads can suffer bad latency
      if they are requeued several times.
      The least we can do for them is to queue in LIFO order after blocking.
      
      PiperOrigin-RevId: 566478783
      Change-Id: I8bac08325f20ff6ccc2658e04e1847fd4614c653
      Dmitry Vyukov committed
  7. 18 Sep, 2023 1 commit
  8. 15 Sep, 2023 7 commits
    • absl: remove special case for timed CondVar waits · 2c1e7e3c
      CondVar wait morhping has a special case for timed waits.
      The code goes back to 2006, it seems that there might have
      been some reasons to do this back then.
      But now it does not seem to be necessary.
      Wait morphing should work just fine after timed CondVar waits.
      Remove the special case and simplify code.
      
      PiperOrigin-RevId: 565798838
      Change-Id: I4e4d61ae7ebd521f5c32dfc673e57a0c245e7cfb
      Dmitry Vyukov committed
    • Honor ABSL_MIN_LOG_LEVEL in CHECK_XX, CHECK_STRXX, CHECK_OK, and the QCHECK flavors of these. · 9356553a
      In particular, if ABSL_MIN_LOG_LEVEL exceeds kFatal, these should, upon failure, terminate the program without logging anything.  The lack of logging should be visible to the optimizer so that it can strip string literals and stringified variable names from the object file.
      
      Making some edge cases work under Clang required rewriting NormalizeLogSeverity to help make constraints on its return value more obvious to the optimizer.
      
      PiperOrigin-RevId: 565792699
      Change-Id: Ibb6a47d4956191bbbd0297e04492cddc354578e2
      Andy Getzendanner committed
    • Fix a bug in which we used propagate_on_container_copy_assignment in btree move assignment. · f44e2cac
      PiperOrigin-RevId: 565730754
      Change-Id: Id828847d32c812736669803c179351433dda4aa6
      Evan Brown committed
    • Move CountingAllocator into test_allocator.h and add some other allocators that… · 49be2e68
      Move CountingAllocator into test_allocator.h and add some other allocators that can be shared between different container tests.
      
      PiperOrigin-RevId: 565693736
      Change-Id: I59af987e30da03a805ce59ff0fb7eeae3fc08293
      Evan Brown committed
    • Allow const qualified FunctionRef instances. This allows the signature to be… · e68f1412
      Allow const qualified FunctionRef instances. This allows the signature to be compatible with AnyInvokable for const uses.
      
      PiperOrigin-RevId: 565682320
      Change-Id: I924dadf110481e572bdb8af0111fa62d6f553d90
      Abseil Team committed
    • absl: optimize Condition checks in Mutex code · 9a592abd
      1. Remove special handling of Condition::kTrue.
      
      Condition::kTrue is used very rarely (frequently its uses even indicate
      confusion and bugs). But we pay few additional branches for kTrue
      on all Condition operations.
      Remove that special handling and simplify logic.
      
      2. And remove known_false condition in Mutex code.
      
      Checking known_false condition only causes slow down because:
      1. We already built skip list with equivalent conditions
      (and keep improving it on every Skip call). And when we built
      the skip list, we used more capable GuaranteedEqual function
      (it does not just check for equality of pointers,
      but for also for equality of function/arg).
      
      2. Condition pointer are rarely equal even for equivalent conditions
      becuase temp Condition objects are usually created on the stack.
      We could call GuaranteedEqual(cond, known_false) instead of cond == known_false,
      but that slows down things even more (see point 1).
      
      So remove the known_false optimization.
      Benchmark results for this and the previous change:
      
      name                        old cpu/op   new cpu/op   delta
      BM_ConditionWaiters/0/1     36.0ns ± 0%  34.9ns ± 0%   -3.02%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/1     36.0ns ± 0%  34.9ns ± 0%   -2.98%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/1     35.9ns ± 0%  34.9ns ± 0%   -3.03%  (p=0.016 n=5+4)
      BM_ConditionWaiters/0/8     55.5ns ± 5%  49.8ns ± 3%  -10.33%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/8     36.2ns ± 0%  35.2ns ± 0%   -2.90%  (p=0.016 n=5+4)
      BM_ConditionWaiters/2/8     53.2ns ± 7%  48.3ns ± 7%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/0/64     295ns ± 1%   254ns ± 2%  -13.73%  (p=0.008 n=5+5)
      BM_ConditionWaiters/1/64    36.2ns ± 0%  35.2ns ± 0%   -2.85%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/64     290ns ± 6%   250ns ± 4%  -13.68%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/512   5.50µs ±12%  4.99µs ± 8%     ~     (p=0.056 n=5+5)
      BM_ConditionWaiters/1/512   36.7ns ± 3%  35.2ns ± 0%   -4.10%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/512   4.44µs ±13%  4.01µs ± 3%   -9.74%  (p=0.008 n=5+5)
      BM_ConditionWaiters/0/4096   104µs ± 6%   101µs ± 3%     ~     (p=0.548 n=5+5)
      BM_ConditionWaiters/1/4096  36.2ns ± 0%  35.1ns ± 0%   -3.03%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/4096  90.4µs ± 5%  85.3µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/0/8192   384µs ± 5%   367µs ± 7%     ~     (p=0.222 n=5+5)
      BM_ConditionWaiters/1/8192  36.2ns ± 0%  35.2ns ± 0%   -2.84%  (p=0.008 n=5+5)
      BM_ConditionWaiters/2/8192   363µs ± 3%   316µs ± 7%  -12.84%  (p=0.008 n=5+5)
      
      PiperOrigin-RevId: 565669535
      Change-Id: I5180c4a787933d2ce477b004a111853753304684
      Dmitry Vyukov committed
    • Remove implicit int64_t->uint64_t conversion in ARM version of V128_Extract64 · c78a3f32
      PiperOrigin-RevId: 565662176
      Change-Id: I18d5d9eb444b0090e3f4ab8f66ad214a67344268
      Abseil Team committed
  9. 14 Sep, 2023 1 commit
  10. 13 Sep, 2023 2 commits
  11. 12 Sep, 2023 3 commits
  12. 11 Sep, 2023 2 commits
  13. 08 Sep, 2023 3 commits
    • Remove CordRepRing experiment. · efb035a5
      We have no intention to use it instead of the CordRepBtree implementation, so cleanup up and remove all code and references.
      
      PiperOrigin-RevId: 563803813
      Change-Id: I95a67318d0f722f3eb7ecdcc7b6c87e28f2e26dd
      Martijn Vels committed
    • Fix strict weak ordering in convert_test.cc · 09d29c58
      It sorts NaNs and the test became flaky. Flakiness arises from the fact that sorting checks randomize and check for 100 elements but we sort here around a thousand
      
      PiperOrigin-RevId: 563783036
      Change-Id: Id25bcb47483acf9c40be3fd1747c37d046197330
      Abseil Team committed
    • Rollback: · 792e55fc
      absl: remove special handling of Condition::kTrue
      absl: remove known_false condition in Mutex code
      There are some test breakages.
      
      PiperOrigin-RevId: 563751370
      Change-Id: Ie14dc799e0a0d286a7e1b47f0a9bbe59dfb23f70
      Abseil Team committed