- 26 Sep, 2023 5 commits
-
-
PiperOrigin-RevId: 568652465 Change-Id: I9f72a11cb514eaf694dae589a19dc139891e7af2
Abseil Team committed -
Siryn's crc32 instruction seems to have latency 3 and throughput 1, which makes the optimal ratio of pmull and crc streams close to that of tested x86 machines. Up to +120% faster for large inputs. PiperOrigin-RevId: 568645559 Change-Id: I86b85b1b2a5d4fb3680c516c4c9044238b20fe61
Connal de Souza committed -
PiperOrigin-RevId: 568603611 Change-Id: I7a31e0d6336a7235a8dc6eeed5680625cb3b4298
Derek Mauro committed -
This also adds a test for `operator<<`. PiperOrigin-RevId: 568590367 Change-Id: Ia0ad39cb582e7d24e6c4131827818d8c4b10dfd9
Abseil Team committed -
`absl::Overload()` which returns a functor that provides overloads based on the functors passed to it. PiperOrigin-RevId: 568476251 Change-Id: Ic625c9b5300d1db496979c178ca1e655581f9276
Abseil Team committed
-
- 23 Sep, 2023 1 commit
-
-
PiperOrigin-RevId: 567869792 Change-Id: I29948282b57b401f3199dc41160538aa9a8079a7
Abseil Team committed
-
- 22 Sep, 2023 1 commit
-
-
PiperOrigin-RevId: 567695227 Change-Id: I13eb8a1872d2fe703b5f3b9bc8df7fec4381fb55
Abseil Team committed
-
- 21 Sep, 2023 4 commits
-
-
PiperOrigin-RevId: 567415671 Change-Id: I59bfcb5ac9fbde227a4cdb3b497b0bd5969b0770
Abseil Team committed -
This is a temporary workaround for an apparent compiler bug with pmull(2) instructions. The current hot loop looks like this: mov w14, #0xef02, lsl x15, x15, #6, mov x13, xzr, movk w14, #0x740e, lsl #16, sub x15, x15, #0x40, ldr q4, [x16, #0x4e0], _LOOP_START: add x16, x9, x13, add x17, x12, x13, fmov d19, x14, <--------- This is Loop invariant and expensive add x13, x13, #0x40, cmp x15, x13, prfm pldl1keep, [x16, #0x140], prfm pldl1keep, [x17, #0x140], ldp x18, x0, [x16, #0x40], crc32cx w10, w10, x18, ldp x2, x18, [x16, #0x50], crc32cx w10, w10, x0, crc32cx w10, w10, x2, ldp x0, x2, [x16, #0x60], crc32cx w10, w10, x18, ldp x18, x16, [x16, #0x70], pmull2 v5.1q, v1.2d, v4.2d, pmull2 v6.1q, v0.2d, v4.2d, pmull2 v7.1q, v2.2d, v4.2d, pmull2 v16.1q, v3.2d, v4.2d, ldp q17, q18, [x17, #0x40], crc32cx w10, w10, x0, pmull v1.1q, v1.1d, v19.1d, crc32cx w10, w10, x2, pmull v0.1q, v0.1d, v19.1d, crc32cx w10, w10, x18, pmull v2.1q, v2.1d, v19.1d, crc32cx w10, w10, x16, pmull v3.1q, v3.1d, v19.1d, ldp q20, q21, [x17, #0x60], eor v1.16b, v17.16b, v1.16b, eor v0.16b, v18.16b, v0.16b, eor v1.16b, v1.16b, v5.16b, eor v2.16b, v20.16b, v2.16b, eor v0.16b, v0.16b, v6.16b, eor v3.16b, v21.16b, v3.16b, eor v2.16b, v2.16b, v7.16b, eor v3.16b, v3.16b, v16.16b, b.ne _LOOP_START There is a redundant fmov that moves the same constant into a Neon register every loop iteration to be used in the PMULL instructions. The PMULL2 instructions already have this constant loaded into Neon registers. After this change, both the PMULL and PMULL2 instructions use the values in q4, and they are not reloaded every iteration. This fmov was expensive because it contends for execution units with crc32cx instructions. This is up to 20% faster for large inputs. PiperOrigin-RevId: 567391972 Change-Id: I4c8e49750cfa5cc5730c3bb713bd9fd67657804a
Connal de Souza committed -
Replace BtreeAllocatorTest with individual test cases for copy/move/swap propagation (defined in test_allocator.h) and minimal alignment. Also remove some extraneous value_types from typed tests. The motivation is to reduce btree_test compile time. PiperOrigin-RevId: 567376572 Change-Id: I6ac6130b99faeadaedab8c2c7b05d5e23e77cc1e
Evan Brown committed -
There are some regressions reported. PiperOrigin-RevId: 567181925 Change-Id: I4ee8a61afd336de7ecb22ec307adb2068932bc8b
Dmitry Vyukov committed
-
- 20 Sep, 2023 6 commits
-
-
`SwisstableDebugEnabled()` is also true for release builds with hardening enabled. To minimize their impact in those builds: - use `ABSL_PREDICT_FALSE()` to provide a compiler hint for code layout - use `ABSL_RAW_LOG()` with a format string to reduce code size and improve the chances that the hot paths will be inlined. PiperOrigin-RevId: 567102494 Change-Id: I6734bd491d7b2e1fb9df0e86f4e29e6ad0a03102
Daniel Cheng committed -
PiperOrigin-RevId: 567102456 Change-Id: I0750284c36850adbabc5ec0b4a2635aa8a967e53
Abseil Team committed -
Tidy up Mutex::[Reader]TryLock codegen by outlining slow path and non-tail function call, and un-unrolling the loop. Current codegen: https://gist.githubusercontent.com/dvyukov/a4d353fd71ac873af9332c1340675b60/raw/226537ffa305b25a79ef3a85277fa870fee5191d/gistfile1.txt New codegen: https://gist.githubusercontent.com/dvyukov/686a094c5aa357025689764f155e5a29/raw/e3125c1cdb5669fac60faf336e2f60395e29d888/gistfile1.txt name old cpu/op new cpu/op delta BM_TryLock 18.0ns ± 0% 17.7ns ± 0% -1.64% (p=0.016 n=4+5) BM_ReaderTryLock/real_time/threads:1 17.9ns ± 0% 17.9ns ± 0% -0.10% (p=0.016 n=5+5) BM_ReaderTryLock/real_time/threads:72 9.61µs ± 8% 8.42µs ± 7% -12.37% (p=0.008 n=5+5) PiperOrigin-RevId: 567006472 Change-Id: Iea0747e71bbf2dc1f00c70a4235203071d795b99
Dmitry Vyukov committed -
PiperOrigin-RevId: 566991965 Change-Id: I6c4d64de79d303e69b18330bda04fdc84d40893d
Dmitry Vyukov committed -
Currently ReaderLock/Unlock tries CAS only once. Even if there is moderate contention from other readers only, ReaderLock/Unlock go onto slow path, which does lots of additional work before retrying the CAS (since there are only readers, the slow path logic is not really needed for anything). Retry CAS while there are only readers. name old cpu/op new cpu/op delta BM_ReaderLock/real_time/threads:1 17.9ns ± 0% 17.9ns ± 0% ~ (p=0.071 n=5+5) BM_ReaderLock/real_time/threads:72 11.4µs ± 3% 8.4µs ± 4% -26.24% (p=0.008 n=5+5) PiperOrigin-RevId: 566981511 Change-Id: I432a3c1d85b84943d0ad4776a34fa5bfcf5b3b8e
Dmitry Vyukov committed -
PiperOrigin-RevId: 566961701 Change-Id: Id04e4c5a598f508a0fe7532ae8f084c583865f2d
Dmitry Vyukov committed
-
- 19 Sep, 2023 4 commits
-
-
PiperOrigin-RevId: 566675048 Change-Id: Ie598c21474858974e4b4adbad401c61a38924c98
Abseil Team committed -
PiperOrigin-RevId: 566650311 Change-Id: Ibfabee88ea9999d08ade05ece362f5a075d19695
Abseil Team committed -
Currently Mutex::Lock contains not inlined non-tail call: TryAcquireWithSpinning -> GetMutexGlobals -> LowLevelCallOnce -> init closure This turns the function into non-leaf with stack frame allocation and additional register use. Remove this non-tail call to make the function leaf. Move spin iterations initialization to LockSlow. Current Lock happy path: 00000000001edc20 <absl::Mutex::Lock()>: 1edc20: 55 push %rbp 1edc21: 48 89 e5 mov %rsp,%rbp 1edc24: 53 push %rbx 1edc25: 50 push %rax 1edc26: 48 89 fb mov %rdi,%rbx 1edc29: 48 8b 07 mov (%rdi),%rax 1edc2c: a8 19 test $0x19,%al 1edc2e: 75 0e jne 1edc3e <absl::Mutex::Lock()+0x1e> 1edc30: 48 89 c1 mov %rax,%rcx 1edc33: 48 83 c9 08 or $0x8,%rcx 1edc37: f0 48 0f b1 0b lock cmpxchg %rcx,(%rbx) 1edc3c: 74 42 je 1edc80 <absl::Mutex::Lock()+0x60> ... unhappy path ... 1edc80: 48 83 c4 08 add $0x8,%rsp 1edc84: 5b pop %rbx 1edc85: 5d pop %rbp 1edc86: c3 ret New Lock happy path: 00000000001eea80 <absl::Mutex::Lock()>: 1eea80: 48 8b 07 mov (%rdi),%rax 1eea83: a8 19 test $0x19,%al 1eea85: 75 0f jne 1eea96 <absl::Mutex::Lock()+0x16> 1eea87: 48 89 c1 mov %rax,%rcx 1eea8a: 48 83 c9 08 or $0x8,%rcx 1eea8e: f0 48 0f b1 0f lock cmpxchg %rcx,(%rdi) 1eea93: 75 01 jne 1eea96 <absl::Mutex::Lock()+0x16> 1eea95: c3 ret ... unhappy path ... PiperOrigin-RevId: 566488042 Change-Id: I62f854b82a322cfb1d42c34f8ed01b4677693fca
Dmitry Vyukov committed -
Currently if a thread already blocked on a Mutex, but then failed to acquire the Mutex, we queue it in FIFO order again. As the result unlucky threads can suffer bad latency if they are requeued several times. The least we can do for them is to queue in LIFO order after blocking. PiperOrigin-RevId: 566478783 Change-Id: I8bac08325f20ff6ccc2658e04e1847fd4614c653
Dmitry Vyukov committed
-
- 18 Sep, 2023 1 commit
-
-
This moves the implementation of most methods from absl::Status to absl::status_internal::StatusRep, and ensures that no calls to absl::Status methods are in a cc file. Stub implementations checking only inlined rep properties and calling no-op (RepToPointer) or out of line methods exist in status.h PiperOrigin-RevId: 566187430 Change-Id: I356ec29c0970ffe82eac2a5d98850e647fcd5ea5
Abseil Team committed
-
- 15 Sep, 2023 7 commits
-
-
CondVar wait morhping has a special case for timed waits. The code goes back to 2006, it seems that there might have been some reasons to do this back then. But now it does not seem to be necessary. Wait morphing should work just fine after timed CondVar waits. Remove the special case and simplify code. PiperOrigin-RevId: 565798838 Change-Id: I4e4d61ae7ebd521f5c32dfc673e57a0c245e7cfb
Dmitry Vyukov committed -
In particular, if ABSL_MIN_LOG_LEVEL exceeds kFatal, these should, upon failure, terminate the program without logging anything. The lack of logging should be visible to the optimizer so that it can strip string literals and stringified variable names from the object file. Making some edge cases work under Clang required rewriting NormalizeLogSeverity to help make constraints on its return value more obvious to the optimizer. PiperOrigin-RevId: 565792699 Change-Id: Ibb6a47d4956191bbbd0297e04492cddc354578e2
Andy Getzendanner committed -
PiperOrigin-RevId: 565730754 Change-Id: Id828847d32c812736669803c179351433dda4aa6
Evan Brown committed -
Move CountingAllocator into test_allocator.h and add some other allocators that can be shared between different container tests. PiperOrigin-RevId: 565693736 Change-Id: I59af987e30da03a805ce59ff0fb7eeae3fc08293
Evan Brown committed -
Allow const qualified FunctionRef instances. This allows the signature to be compatible with AnyInvokable for const uses. PiperOrigin-RevId: 565682320 Change-Id: I924dadf110481e572bdb8af0111fa62d6f553d90
Abseil Team committed -
1. Remove special handling of Condition::kTrue. Condition::kTrue is used very rarely (frequently its uses even indicate confusion and bugs). But we pay few additional branches for kTrue on all Condition operations. Remove that special handling and simplify logic. 2. And remove known_false condition in Mutex code. Checking known_false condition only causes slow down because: 1. We already built skip list with equivalent conditions (and keep improving it on every Skip call). And when we built the skip list, we used more capable GuaranteedEqual function (it does not just check for equality of pointers, but for also for equality of function/arg). 2. Condition pointer are rarely equal even for equivalent conditions becuase temp Condition objects are usually created on the stack. We could call GuaranteedEqual(cond, known_false) instead of cond == known_false, but that slows down things even more (see point 1). So remove the known_false optimization. Benchmark results for this and the previous change: name old cpu/op new cpu/op delta BM_ConditionWaiters/0/1 36.0ns ± 0% 34.9ns ± 0% -3.02% (p=0.008 n=5+5) BM_ConditionWaiters/1/1 36.0ns ± 0% 34.9ns ± 0% -2.98% (p=0.008 n=5+5) BM_ConditionWaiters/2/1 35.9ns ± 0% 34.9ns ± 0% -3.03% (p=0.016 n=5+4) BM_ConditionWaiters/0/8 55.5ns ± 5% 49.8ns ± 3% -10.33% (p=0.008 n=5+5) BM_ConditionWaiters/1/8 36.2ns ± 0% 35.2ns ± 0% -2.90% (p=0.016 n=5+4) BM_ConditionWaiters/2/8 53.2ns ± 7% 48.3ns ± 7% ~ (p=0.056 n=5+5) BM_ConditionWaiters/0/64 295ns ± 1% 254ns ± 2% -13.73% (p=0.008 n=5+5) BM_ConditionWaiters/1/64 36.2ns ± 0% 35.2ns ± 0% -2.85% (p=0.008 n=5+5) BM_ConditionWaiters/2/64 290ns ± 6% 250ns ± 4% -13.68% (p=0.008 n=5+5) BM_ConditionWaiters/0/512 5.50µs ±12% 4.99µs ± 8% ~ (p=0.056 n=5+5) BM_ConditionWaiters/1/512 36.7ns ± 3% 35.2ns ± 0% -4.10% (p=0.008 n=5+5) BM_ConditionWaiters/2/512 4.44µs ±13% 4.01µs ± 3% -9.74% (p=0.008 n=5+5) BM_ConditionWaiters/0/4096 104µs ± 6% 101µs ± 3% ~ (p=0.548 n=5+5) BM_ConditionWaiters/1/4096 36.2ns ± 0% 35.1ns ± 0% -3.03% (p=0.008 n=5+5) BM_ConditionWaiters/2/4096 90.4µs ± 5% 85.3µs ± 7% ~ (p=0.222 n=5+5) BM_ConditionWaiters/0/8192 384µs ± 5% 367µs ± 7% ~ (p=0.222 n=5+5) BM_ConditionWaiters/1/8192 36.2ns ± 0% 35.2ns ± 0% -2.84% (p=0.008 n=5+5) BM_ConditionWaiters/2/8192 363µs ± 3% 316µs ± 7% -12.84% (p=0.008 n=5+5) PiperOrigin-RevId: 565669535 Change-Id: I5180c4a787933d2ce477b004a111853753304684
Dmitry Vyukov committed -
PiperOrigin-RevId: 565662176 Change-Id: I18d5d9eb444b0090e3f4ab8f66ad214a67344268
Abseil Team committed
-
- 14 Sep, 2023 1 commit
-
-
PiperOrigin-RevId: 565330231 Change-Id: I84f0e9065986bb592b5bfb196b3fc221feb14bc4
Abseil Team committed
-
- 13 Sep, 2023 2 commits
-
-
PiperOrigin-RevId: 565050503 Change-Id: I8f4c463be4ef513a2788745d1b454a7ede489152
Abseil Team committed -
PiperOrigin-RevId: 565040001 Change-Id: I1c2e715c97375754c8d863132be2c388265ca4ad
Abseil Team committed
-
- 12 Sep, 2023 3 commits
-
-
PiperOrigin-RevId: 564779671 Change-Id: I8cae825a533a00ff1983b48782486d5d00dae69a
Abseil Team committed -
This should enable binary size savings for now and more efficiency improvements with small buffer optimization. PiperOrigin-RevId: 564741270 Change-Id: Icf204d88256243eb60464439a52dd589d7a559cb
Evan Brown committed -
Add a flat_hash_set_test that we use value_type member functions to read/write from value_types when we aren't allowed to memcpy them. The motivation is to prevent a bug in small buffer optimization for swisstables. PiperOrigin-RevId: 564726325 Change-Id: Id0df5d28d65c7586428001fcb266886988cd481e
Evan Brown committed
-
- 11 Sep, 2023 2 commits
-
-
65d7b6d4 changed StrCat() to not use an intermediate buffer when the result fits in the SSO buffer, but only libc++ has an SSO buffer large enough for this optimization to work. PiperOrigin-RevId: 564447163 Change-Id: I0c7fa4afed3369b36e13e7d1691eb7f933ea0091
Derek Mauro committed -
PiperOrigin-RevId: 564296635 Change-Id: I13ca663cdb676948a7041c5671b82a97a4388ff1
Abseil Team committed
-
- 08 Sep, 2023 3 commits
-
-
We have no intention to use it instead of the CordRepBtree implementation, so cleanup up and remove all code and references. PiperOrigin-RevId: 563803813 Change-Id: I95a67318d0f722f3eb7ecdcc7b6c87e28f2e26dd
Martijn Vels committed -
It sorts NaNs and the test became flaky. Flakiness arises from the fact that sorting checks randomize and check for 100 elements but we sort here around a thousand PiperOrigin-RevId: 563783036 Change-Id: Id25bcb47483acf9c40be3fd1747c37d046197330
Abseil Team committed -
absl: remove special handling of Condition::kTrue absl: remove known_false condition in Mutex code There are some test breakages. PiperOrigin-RevId: 563751370 Change-Id: Ie14dc799e0a0d286a7e1b47f0a9bbe59dfb23f70
Abseil Team committed
-